1
|
Huang YZ, Chen YM, Lin CC, Chiu HY, Chang YC. A nursing note-aware deep neural network for predicting mortality risk after hospital discharge. Int J Nurs Stud 2024; 156:104797. [PMID: 38788263 DOI: 10.1016/j.ijnurstu.2024.104797] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2023] [Revised: 04/08/2024] [Accepted: 05/03/2024] [Indexed: 05/26/2024]
Abstract
BACKGROUND ICU readmissions and post-discharge mortality pose significant challenges. Previous studies used EHRs and machine learning models, but mostly focused on structured data. Nursing records contain crucial unstructured information, but their utilization is challenging. Natural language processing (NLP) can extract structured features from clinical text. This study proposes the Crucial Nursing Description Extractor (CNDE) to predict post-ICU discharge mortality rates and identify high-risk patients for unplanned readmission by analyzing electronic nursing records. OBJECTIVE Developed a deep neural network (NurnaNet) with the ability to perceive nursing records, combined with a bio-clinical medicine pre-trained language model (BioClinicalBERT) to analyze the electronic health records (EHRs) in the MIMIC III dataset to predict the death of patients within six month and two year risk. DESIGN A cohort and system development design was used. SETTING(S) Based on data extracted from MIMIC-III, a database of critically ill in the US between 2001 and 2012, the results were analyzed. PARTICIPANTS We calculated patients' age using admission time and date of birth information from the MIMIC dataset. Patients under 18 or over 89 years old, or who died in the hospital, were excluded. We analyzed 16,973 nursing records from patients' ICU stays. METHODS We have developed a technology called the Crucial Nursing Description Extractor (CNDE), which extracts key content from text. We use the logarithmic likelihood ratio to extract keywords and combine BioClinicalBERT. We predict the survival of discharged patients after six months and two years and evaluate the performance of the model using precision, recall, the F1-score, the receiver operating characteristic curve (ROC curve), the area under the curve (AUC), and the precision-recall curve (PR curve). RESULTS The research findings indicate that NurnaNet achieved good F1-scores (0.67030, 0.70874) within six months and two years. Compared to using BioClinicalBERT alone, there was an improvement in performance of 2.05 % and 1.08 % for predictions within six months and two years, respectively. CONCLUSIONS CNDE can effectively reduce long-form records and extract key content. NurnaNet has a good F1-score in analyzing the data of nursing records, which helps to identify the risk of death of patients after leaving the hospital and adjust the regular follow-up and treatment plan of relevant medical care as soon as possible.
Collapse
Affiliation(s)
- Yong-Zhen Huang
- Graduate Institute of Data Science, College of Management, Taipei Medical University, Taipei, Taiwan; Department of Nursing, National Taiwan University Cancer Center, Taipei, Taiwan.
| | - Yan-Ming Chen
- Graduate Institute of Data Science, College of Management, Taipei Medical University, Taipei, Taiwan.
| | - Chih-Cheng Lin
- Graduate Institute of Data Science, College of Management, Taipei Medical University, Taipei, Taiwan.
| | - Hsiao-Yean Chiu
- School of Nursing, College of Nursing, Taipei Medical University, Taipei, Taiwan; Department of Nursing, Taipei Medical University Hospital, Taipei, Taiwan; Research Center of Sleep Medicine, College of Medicine, Taipei Medical University, Taipei, Taiwan.
| | - Yung-Chun Chang
- Graduate Institute of Data Science, College of Management, Taipei Medical University, Taipei, Taiwan; Clinical Big Data Research Center, Taipei Medical University Hospital, Taipei, Taiwan.
| |
Collapse
|
2
|
Gliwska E, Barańska K, Maćkowska S, Różańska A, Sobol A, Spinczyk D. The Use of Natural Language Processing for Computer-Aided Diagnostics and Monitoring of Body Image Perception in Patients with Cancers. Cancers (Basel) 2023; 15:5437. [PMID: 38001696 PMCID: PMC10670138 DOI: 10.3390/cancers15225437] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2023] [Revised: 11/08/2023] [Accepted: 11/13/2023] [Indexed: 11/26/2023] Open
Abstract
BACKGROUND Head and neck cancers (H&NCs) constitute a significant part of all cancer cases. H&NC patients experience unintentional weight loss, poor nutritional status, or speech disorders. Medical interventions affect appearance and interfere with patients' self-perception of their bodies. Psychological consultations are not affordable due to limited time. METHODS We used NLP to analyze the basic emotion intensity, sentiment about one's body, characteristic vocabulary, and potential areas of difficulty in free notes. The emotion intensity research uses the extended NAWL dictionary developed using word embedding. The sentiment analysis used a hybrid approach: a sentiment dictionary and a deep recursive network. The part-of-speech tagging and domain rules defined by a psycho-oncologist determine the distinct language traits. Potential areas of difficulty were analyzed using the dictionaries method with word polarity to define a given area and the presentation of a note using bag-of-words. Here, we applied the LSA method using SVD to reduce dimensionality. A total of 50 cancer patients requiring enteral nutrition participated in the study. RESULTS The results confirmed the complexity of emotions in patients with H&NC in relation to their body image. A negative attitude towards body image was detected in most of the patients. The method presented in the study appeared to be effective in assessing body image perception disturbances, but it cannot be used as the sole indicator of body image perception issues. LIMITATIONS The main problem in the research was the fairly wide age range of participants, which explains the potential diversity of vocabulary. CONCLUSIONS The combination of the attributes of a patient's condition, possible to determine using the method for a specific patient, can indicate the direction of support for the patient, relatives, direct medical personnel, and psycho-oncologists.
Collapse
Affiliation(s)
- Elwira Gliwska
- Department of Food Market and Consumer Research, Institute of Human Nutrition Sciences, Warsaw University of Life Sciences (WULS-SGGW), 159C Nowoursynowska Street, 02-776 Warsaw, Poland;
- Cancer Epidemiology and Primary Prevention Department, The Maria Sklodowska-Curie National Research Institute of Oncology, 15B Wawelska Street, 02-034 Warsaw, Poland
| | - Klaudia Barańska
- Department of Medical Informatics and Artificial Intelligence, Faculty of Biomedical Engineering, Silesian University of Technology, Roosevelta 40, 41-800 Zabrze, Poland; (K.B.); (S.M.)
- Polish National Cancer Registry, Maria Sklodowska-Curie National Research Institute of Oncology, 02-781 Warsaw, Poland
| | - Stella Maćkowska
- Department of Medical Informatics and Artificial Intelligence, Faculty of Biomedical Engineering, Silesian University of Technology, Roosevelta 40, 41-800 Zabrze, Poland; (K.B.); (S.M.)
| | - Agnieszka Różańska
- Department of Medical Informatics and Artificial Intelligence, Faculty of Biomedical Engineering, Silesian University of Technology, Roosevelta 40, 41-800 Zabrze, Poland; (K.B.); (S.M.)
| | - Adrianna Sobol
- Department of Oncological Propaedeutics, Medical University of Warsaw, 00-518 Warsaw, Poland
| | - Dominik Spinczyk
- Department of Medical Informatics and Artificial Intelligence, Faculty of Biomedical Engineering, Silesian University of Technology, Roosevelta 40, 41-800 Zabrze, Poland; (K.B.); (S.M.)
| |
Collapse
|
3
|
Kotevski DP, Vajdic CM, Field M, Smee RI. Inter-hospital variation in data collection, radiotherapy treatment, and survival in patients with head and neck cancer: A multisite study. Radiother Oncol 2023; 188:109843. [PMID: 37543056 DOI: 10.1016/j.radonc.2023.109843] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2023] [Revised: 06/14/2023] [Accepted: 07/27/2023] [Indexed: 08/07/2023]
Abstract
BACKGROUND AND PURPOSE Inter-hospital inequalities in head and neck cancer (HNC) survival may exist due to variation in radiotherapy treatment-related factors. This study investigated inter-hospital variation in data collection, primary radiotherapy treatment, and survival in HNC patients from an Australian setting. MATERIALS AND METHODS Data collected in oncology information systems (OIS) from seven Australian hospitals was extracted for 3,182 adults treated with curative radiotherapy, with or without surgery or chemotherapy, for primary, non-metastatic squamous cell carcinoma of the head and neck (2000-2017). Death data was sourced from the National Death Index using record linkage. Multivariable Cox regression was used to assess the association between survival and hospital. RESULTS Inter-hospital variation in data collection, primary radiotherapy dose, and five-year HNC-related death was detected. Completion of eleven fields ranged from 66%-98%. Primary radiotherapy treated Tis-T1N0 glottic and any stage oral cavity and oropharynx cancers received significantly different time-corrected biologically equivalent dose in two gray fractions (EQD2T) by hospital, with observed deviation from Australian radiotherapy guidelines. Increased EQD2T dose was associated with a reduced risk of five-year HNC-related death in all patients and those treated with primary radiotherapy. Hospital, tumour site, and T and N classification were also identified as independent prognostic factors for five-year HNC-related death in all patients treated with radiotherapy. CONCLUSION Unexplained variation exists in HNC-related death in patients treated at Australian hospitals. Available routinely collected data in OIS are insufficient to explain variation in survival. Innovative data collection, extraction, and classification practices are needed to inform clinical practice.
Collapse
Affiliation(s)
- Damian P Kotevski
- Department of Radiation Oncology, Prince of Wales Hospital and Community Health Services, New South Wales, Australia; Prince of Wales Clinical School, Faculty of Medicine, University of New South Wales, New South Wales, Australia.
| | - Claire M Vajdic
- Kirby Institute, Faculty of Medicine, University of New South Wales, New South Wales, Australia
| | - Matthew Field
- South Western Sydney Clinical Campus, School of Clinical Medicine, University of New South Wales, New South Wales, Australia; South Western Sydney Cancer Services, NSW Health, New South Wales, Australia; Ingham Institute for Applied Medical Research, New South Wales, Australia
| | - Robert I Smee
- Department of Radiation Oncology, Prince of Wales Hospital and Community Health Services, New South Wales, Australia; Prince of Wales Clinical School, Faculty of Medicine, University of New South Wales, New South Wales, Australia; Department of Radiation Oncology, Tamworth Base Hospital, Tamworth, New South Wales, Australia
| |
Collapse
|
4
|
Bitterman DS, Goldner E, Finan S, Harris D, Durbin EB, Hochheiser H, Warner JL, Mak RH, Miller T, Savova GK. An End-to-End Natural Language Processing System for Automatically Extracting Radiation Therapy Events From Clinical Texts. Int J Radiat Oncol Biol Phys 2023; 117:262-273. [PMID: 36990288 PMCID: PMC10522797 DOI: 10.1016/j.ijrobp.2023.03.055] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2022] [Revised: 02/15/2023] [Accepted: 03/17/2023] [Indexed: 03/29/2023]
Abstract
PURPOSE Real-world evidence for radiation therapy (RT) is limited because it is often documented only in the clinical narrative. We developed a natural language processing system for automated extraction of detailed RT events from text to support clinical phenotyping. METHODS AND MATERIALS A multi-institutional data set of 96 clinician notes, 129 North American Association of Central Cancer Registries cancer abstracts, and 270 RT prescriptions from HemOnc.org was used and divided into train, development, and test sets. Documents were annotated for RT events and associated properties: dose, fraction frequency, fraction number, date, treatment site, and boost. Named entity recognition models for properties were developed by fine-tuning BioClinicalBERT and RoBERTa transformer models. A multiclass RoBERTa-based relation extraction model was developed to link each dose mention with each property in the same event. Models were combined with symbolic rules to create a hybrid end-to-end pipeline for comprehensive RT event extraction. RESULTS Named entity recognition models were evaluated on the held-out test set with F1 results of 0.96, 0.88, 0.94, 0.88, 0.67, and 0.94 for dose, fraction frequency, fraction number, date, treatment site, and boost, respectively. The relation model achieved an average F1 of 0.86 when the input was gold-labeled entities. The end-to-end system F1 result was 0.81. The end-to-end system performed best on North American Association of Central Cancer Registries abstracts (average F1 0.90), which are mostly copy-paste content from clinician notes. CONCLUSIONS We developed methods and a hybrid end-to-end system for RT event extraction, which is the first natural language processing system for this task. This system provides proof-of-concept for real-world RT data collection for research and is promising for the potential of natural language processing methods to support clinical care.
Collapse
Affiliation(s)
- Danielle S Bitterman
- Computational Health Informatics Program, Boston Children's Hospital, Harvard Medical School, Boston, Massachusetts; Department of Radiation Oncology, Brigham and Women's Hospital/Dana-Farber Cancer Institute, Harvard Medical School, Boston, Massachusetts; Artificial Intelligence in Medicine (AIM) Program, Mass General Brigham, Harvard Medical School, Boston, Massachusetts.
| | - Eli Goldner
- Computational Health Informatics Program, Boston Children's Hospital, Harvard Medical School, Boston, Massachusetts
| | - Sean Finan
- Computational Health Informatics Program, Boston Children's Hospital, Harvard Medical School, Boston, Massachusetts
| | - David Harris
- Computational Health Informatics Program, Boston Children's Hospital, Harvard Medical School, Boston, Massachusetts
| | - Eric B Durbin
- College of Medicine, University of Kentucky, Lexington, Kentucky; Kentucky Cancer Registry, Lexington, Kentucky
| | - Harry Hochheiser
- Department of Biomedical Informatics, University of Pittsburgh School of Medicine, Pittsburgh, Pennsylvania
| | - Jeremy L Warner
- Population Sciences Program, Legorreta Cancer Center, Brown University, Providence, Rhode Island; Lifespan Cancer Institute, Providence, Rhode Island
| | - Raymond H Mak
- Department of Radiation Oncology, Brigham and Women's Hospital/Dana-Farber Cancer Institute, Harvard Medical School, Boston, Massachusetts; Artificial Intelligence in Medicine (AIM) Program, Mass General Brigham, Harvard Medical School, Boston, Massachusetts
| | - Timothy Miller
- Computational Health Informatics Program, Boston Children's Hospital, Harvard Medical School, Boston, Massachusetts
| | - Guergana K Savova
- Computational Health Informatics Program, Boston Children's Hospital, Harvard Medical School, Boston, Massachusetts
| |
Collapse
|
5
|
Mottin L, Goldman JP, Jäggli C, Achermann R, Gobeill J, Knafou J, Ehrsam J, Wicky A, Gérard CL, Schwenk T, Charrier M, Tsantoulis P, Lovis C, Leichtle A, Kiessling MK, Michielin O, Pradervand S, Foufi V, Ruch P. Multilingual RECIST classification of radiology reports using supervised learning. Front Digit Health 2023; 5:1195017. [PMID: 37388252 PMCID: PMC10303934 DOI: 10.3389/fdgth.2023.1195017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2023] [Accepted: 06/05/2023] [Indexed: 07/01/2023] Open
Abstract
Objectives The objective of this study is the exploration of Artificial Intelligence and Natural Language Processing techniques to support the automatic assignment of the four Response Evaluation Criteria in Solid Tumors (RECIST) scales based on radiology reports. We also aim at evaluating how languages and institutional specificities of Swiss teaching hospitals are likely to affect the quality of the classification in French and German languages. Methods In our approach, 7 machine learning methods were evaluated to establish a strong baseline. Then, robust models were built, fine-tuned according to the language (French and German), and compared with the expert annotation. Results The best strategies yield average F1-scores of 90% and 86% respectively for the 2-classes (Progressive/Non-progressive) and the 4-classes (Progressive Disease, Stable Disease, Partial Response, Complete Response) RECIST classification tasks. Conclusions These results are competitive with the manual labeling as measured by Matthew's correlation coefficient and Cohen's Kappa (79% and 76%). On this basis, we confirm the capacity of specific models to generalize on new unseen data and we assess the impact of using Pre-trained Language Models (PLMs) on the accuracy of the classifiers.
Collapse
Affiliation(s)
- Luc Mottin
- HES-SO\HEG Genève, Information Sciences, Geneva, Switzerland
- SIB Text Mining Group, Swiss Institute of Bioinformatics, Geneva, Switzerland
| | - Jean-Philippe Goldman
- Division of Medical Information Sciences, University Hospitals of Geneva, Geneva, Switzerland
| | - Christoph Jäggli
- Inselspital – Bern University Hospital and University of Bern, Bern, Switzerland
| | - Rita Achermann
- Department of Radiology, Clinic of Radiology & Nuclear Medicine, University Hospital Basel, University of Basel, Basel, Switzerland
| | - Julien Gobeill
- HES-SO\HEG Genève, Information Sciences, Geneva, Switzerland
- SIB Text Mining Group, Swiss Institute of Bioinformatics, Geneva, Switzerland
| | - Julien Knafou
- HES-SO\HEG Genève, Information Sciences, Geneva, Switzerland
- SIB Text Mining Group, Swiss Institute of Bioinformatics, Geneva, Switzerland
| | - Julien Ehrsam
- Department of Radiology and Medical Informatics, University of Geneva, Geneva, Switzerland
| | - Alexandre Wicky
- Precision Oncology Center, Oncology Department, Centre Hospitalier Universitaire Vaudois – CHUV, Lausanne, Switzerland
| | - Camille L. Gérard
- Precision Oncology Center, Oncology Department, Centre Hospitalier Universitaire Vaudois – CHUV, Lausanne, Switzerland
| | - Tanja Schwenk
- Department of Oncology, Kantonsspital Aarau, Aarau, Switzerland
| | - Mélinda Charrier
- Division of Medical Information Sciences, University Hospitals of Geneva, Geneva, Switzerland
| | - Petros Tsantoulis
- Division of Medical Information Sciences, University Hospitals of Geneva, Geneva, Switzerland
- Department of Radiology and Medical Informatics, University of Geneva, Geneva, Switzerland
| | - Christian Lovis
- Division of Medical Information Sciences, University Hospitals of Geneva, Geneva, Switzerland
- Department of Radiology and Medical Informatics, University of Geneva, Geneva, Switzerland
| | - Alexander Leichtle
- Inselspital – Bern University Hospital and University of Bern, Bern, Switzerland
| | - Michael K. Kiessling
- Department of Medical Oncology and Hematology, University Hospital Zurich, Zurich, Switzerland
| | - Olivier Michielin
- Precision Oncology Center, Oncology Department, Centre Hospitalier Universitaire Vaudois – CHUV, Lausanne, Switzerland
| | - Sylvain Pradervand
- Precision Oncology Center, Oncology Department, Centre Hospitalier Universitaire Vaudois – CHUV, Lausanne, Switzerland
| | - Vasiliki Foufi
- Division of Medical Information Sciences, University Hospitals of Geneva, Geneva, Switzerland
| | - Patrick Ruch
- HES-SO\HEG Genève, Information Sciences, Geneva, Switzerland
- SIB Text Mining Group, Swiss Institute of Bioinformatics, Geneva, Switzerland
| |
Collapse
|
6
|
Gao J, He S, Hu J, Chen G. A hybrid system to understand the relations between assessments and plans in progress notes. J Biomed Inform 2023; 141:104363. [PMID: 37054961 DOI: 10.1016/j.jbi.2023.104363] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2023] [Revised: 04/05/2023] [Accepted: 04/07/2023] [Indexed: 04/15/2023]
Abstract
OBJECTIVE The paper presents a novel solution to the 2022 National NLP Clinical Challenges (n2c2) Track 3, which aims to predict the relations between assessment and plan subsections in progress notes. METHODS Our approach goes beyond standard transformer models and incorporates external information such as medical ontology and order information to comprehend the semantics of progress notes. We fine-tuned transformers to understand the textual data and incorporated medical ontology concepts and their relationships to enhance the model's accuracy. We also captured order information that regular transformers cannot by taking into account the position of the assessment and plan subsections in progress notes. RESULTS Our submission earned third place in the challenge phase with a macro-F1 score of 0.811. After refining our pipeline further, we achieved a macro-F1 of 0.826, outperforming the top-performing system during the challenge phase. CONCLUSION Our approach, which combines fine-tuned transformers, medical ontology, and order information, outperformed other systems in predicting the relationships between assessment and plan subsections in progress notes. This highlights the importance of incorporating external information beyond textual data in natural language processing (NLP) tasks related to medical documentation. Our work could potentially improve the efficiency and accuracy of progress note analysis.
Collapse
Affiliation(s)
- Jifan Gao
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, 610 Walnut St, Madison, 53726, WI, USA
| | - Shilu He
- Department of Mathematics, University of Wisconsin-Madison, 480 Lincoln Dr, Madison, 53706, WI, USA
| | - Junjie Hu
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, 610 Walnut St, Madison, 53726, WI, USA.
| | - Guanhua Chen
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, 610 Walnut St, Madison, 53726, WI, USA.
| |
Collapse
|
7
|
Tyagi N, Bhushan B. Demystifying the Role of Natural Language Processing (NLP) in Smart City Applications: Background, Motivation, Recent Advances, and Future Research Directions. WIRELESS PERSONAL COMMUNICATIONS 2023; 130:857-908. [PMID: 37168438 PMCID: PMC10019426 DOI: 10.1007/s11277-023-10312-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Accepted: 02/25/2023] [Indexed: 05/13/2023]
Abstract
Smart cities provide an efficient infrastructure for the enhancement of the quality of life of the people by aiding in fast urbanization and resource management through sustainable and scalable innovative solutions. The penetration of Information and Communication Technology (ICT) in smart cities has been a major contributor to keeping up with the agility and pace of their development. In this paper, we have explored Natural Language Processing (NLP) which is one such technical discipline that has great potential in optimizing ICT processes and has so far been kept away from the limelight. Through this study, we have established the various roles that NLP plays in building smart cities after thoroughly analyzing its architecture, background, and scope. Subsequently, we present a detailed description of NLP's recent applications in the domain of smart healthcare, smart business, and industry, smart community, smart media, smart research, and development as well as smart education accompanied by NLP's open challenges at the very end. This work aims to throw light on the potential of NLP as one of the pillars in assisting the technical advancement and realization of smart cities.
Collapse
Affiliation(s)
- Nemika Tyagi
- Department of Computer Science and Engineering School of Engineering and Technology, Sharda University, Greater Noida, Uttar Pradesh 201310 India
| | - Bharat Bhushan
- Department of Computer Science and Engineering School of Engineering and Technology, Sharda University, Greater Noida, Uttar Pradesh 201310 India
| |
Collapse
|
8
|
Kotevski DP, Smee RI, Vajdic CM, Field M. Empirical comparison of routinely collected electronic health record data for head and neck cancer-specific survival in machine-learnt prognostic models. Head Neck 2023; 45:365-379. [PMID: 36369773 PMCID: PMC10100433 DOI: 10.1002/hed.27241] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2022] [Revised: 09/21/2022] [Accepted: 11/02/2022] [Indexed: 11/13/2022] Open
Abstract
BACKGROUND Knowledge of the prognostic factors and performance of machine learning predictive models for 2-year cancer-specific survival (CSS) is limited in the head and neck cancer (HNC) population. METHODS Data from our facilities' oncology information system (OIS) collected for routine practice (OIS dataset, n = 430 patients) and research purposes (research dataset, n = 529 patients) were extracted on adults diagnosed between 2000 and 2017 with squamous cell carcinoma of the head and neck. RESULTS Machine learning demonstrated excellent performance (area under the curve, AUC) in the whole cohort (AUC = 0.97, research dataset), larynx cohort (AUC = 0.98, both datasets), and oropharynx cohort (AUC = 0.99, both datasets). Tumor site and T classification were identified as predictors of 2-year CSS in both datasets. Hypothyroidism and fitness for operation were further identified in the research dataset. CONCLUSIONS Datasets extracted from an OIS for routine clinical practice and research purposes demonstrated high utility for informing 2-year head and neck CSS.
Collapse
Affiliation(s)
- Damian P Kotevski
- Department of Radiation Oncology, Prince of Wales Hospital and Community Health Services, Sydney, New South Wales, Australia.,Prince of Wales Clinical School, Faculty of Medicine, University of New South Wales, Sydney, New South Wales, Australia
| | - Robert I Smee
- Department of Radiation Oncology, Prince of Wales Hospital and Community Health Services, Sydney, New South Wales, Australia.,Prince of Wales Clinical School, Faculty of Medicine, University of New South Wales, Sydney, New South Wales, Australia.,Department of Radiation Oncology, Tamworth Base Hospital, Tamworth, New South Wales, Australia
| | - Claire M Vajdic
- Centre for Big Data Research in Health, Faculty of Medicine, University of New South Wales, Sydney, New South Wales, Australia.,Kirby Institute, Faculty of Medicine, University of New South Wales, Sydney, New South Wales, Australia
| | - Matthew Field
- South Western Sydney Clinical School, Faculty of Medicine, University of New South Wales, Sydney, New South Wales, Australia.,Ingham Institute for Applied Medical Research, Sydney, New South Wales, Australia
| |
Collapse
|
9
|
Kotevski DP, Smee RI, Field M, Broadley K, Vajdic CM. The Utility of Oncology Information Systems for Prognostic Modelling in Head and Neck Cancer. J Med Syst 2023; 47:9. [PMID: 36640212 PMCID: PMC9840592 DOI: 10.1007/s10916-023-01907-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2022] [Accepted: 01/03/2023] [Indexed: 01/15/2023]
Abstract
Cancer centres rely on electronic information in oncology information systems (OIS) to guide patient care. We investigated the completeness and accuracy of routinely collected head and neck cancer (HNC) data sourced from an OIS for suitability in prognostic modelling and other research. Three hundred and fifty-three adults diagnosed from 2000 to 2017 with head and neck squamous cell carcinoma, treated with radiotherapy, were eligible. Thirteen clinically relevant variables in HNC prognosis were extracted from a single-centre OIS and compared to that compiled separately in a research dataset. These two datasets were compared for agreement using Cohen's kappa coefficient for categorical variables, and intraclass correlation coefficients for continuous variables. Research data was 96% complete compared to 84% for OIS data. Agreement was perfect for gender (κ = 1.000), high for age (κ = 0.993), site (κ = 0.992), T (κ = 0.851) and N (κ = 0.812) stage, radiotherapy dose (κ = 0.889), fractions (κ = 0.856), and duration (κ = 0.818), and chemotherapy treatment (κ = 0.871), substantial for overall stage (κ = 0.791) and vital status (κ = 0.689), moderate for grade (κ = 0.547), and poor for performance status (κ = 0.110). Thirty-one other variables were poorly captured and could not be statistically compared. Documentation of clinical information within the OIS for HNC patients is routine practice; however, OIS data was less correct and complete than data collected for research purposes. Substandard collection of routine data may hinder advancements in patient care. Improved data entry, integration with clinical activities and workflows, system usability, data dictionaries, and training are necessary for OIS data to generate robust research. Data mining from clinical documents may supplement structured data collection.
Collapse
Affiliation(s)
- Damian P Kotevski
- Department of Radiation Oncology, Prince of Wales Hospital, Level 1, Bright Building, Barker St, Randwick, NSW, 2031, Australia.
- Prince of Wales Clinical School, Faculty of Medicine, University of New South Wales, Kensington, NSW, Australia.
| | - Robert I Smee
- Department of Radiation Oncology, Prince of Wales Hospital, Level 1, Bright Building, Barker St, Randwick, NSW, 2031, Australia
- Prince of Wales Clinical School, Faculty of Medicine, University of New South Wales, Kensington, NSW, Australia
- Department of Radiation Oncology, Tamworth Base Hospital, Tamworth, NSW, Australia
| | - Matthew Field
- South Western Sydney Clinical School, Faculty of Medicine, University of New South Wales, Kensington, NSW, Australia
- Ingham Institute for Applied Medical Research, Liverpool, NSW, Australia
| | - Kathryn Broadley
- Cancer and Haematology Services, Prince of Wales Hospital, Randwick, NSW, Australia
| | - Claire M Vajdic
- Centre for Big Data Research in Health, Faculty of Medicine, University of New South Wales, Kensington, NSW, Australia
- Kirby Institute, Faculty of Medicine, University of New South Wales, Kensington, NSW, Australia
| |
Collapse
|
10
|
Kotevski DP, Smee RI, Vajdic CM, Field M. Machine Learning and Nomogram Prognostic Modeling for 2-Year Head and Neck Cancer-Specific Survival Using Electronic Health Record Data: A Multisite Study. JCO Clin Cancer Inform 2023; 7:e2200128. [PMID: 36596211 DOI: 10.1200/cci.22.00128] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023] Open
Abstract
PURPOSE There is limited knowledge of the prediction of 2-year cancer-specific survival (CSS) in the head and neck cancer (HNC) population. The aim of this study is to develop and validate machine learning models and a nomogram for the prediction of 2-year CSS in patients with HNC using real-world data collected by major teaching and tertiary referral hospitals in New South Wales (NSW), Australia. MATERIALS AND METHODS Data collected in oncology information systems at multiple NSW Cancer Centres were extracted for 2,953 eligible adults diagnosed between 2000 and 2017 with squamous cell carcinoma of the head and neck. Death data were sourced from the National Death Index using record linkage. Machine learning and Cox regression/nomogram models were developed and internally validated in Python and R, respectively. RESULTS Machine learning models demonstrated highest performance (C-index) in the larynx and nasopharynx cohorts (0.82), followed by the oropharynx (0.79) and the hypopharynx and oral cavity cohorts (0.73). In the whole HNC population, C-indexes of 0.79 and 0.70 and Brier scores of 0.10 and 0.27 were reported for the machine learning and nomogram model, respectively. Cox regression analysis identified age, T and N classification, and time-corrected biologic equivalent dose in two gray fractions as independent prognostic factors for 2-year CSS. N classification was the most important feature used for prediction in the machine learning model followed by age. CONCLUSION Machine learning and nomogram analysis predicted 2-year CSS with high performance using routinely collected and complete clinical information extracted from oncology information systems. These models function as visual decision-making tools to guide radiotherapy treatment decisions and provide insight into the prediction of survival outcomes in patients with HNC.
Collapse
Affiliation(s)
- Damian P Kotevski
- Department of Radiation Oncology, Prince of Wales Hospital and Community Health Services, Sydney, New South Wales, Australia.,Prince of Wales Clinical School, Faculty of Medicine, University of New South Wales, Sydney, New South Wales, Australia
| | - Robert I Smee
- Department of Radiation Oncology, Prince of Wales Hospital and Community Health Services, Sydney, New South Wales, Australia.,Prince of Wales Clinical School, Faculty of Medicine, University of New South Wales, Sydney, New South Wales, Australia.,Department of Radiation Oncology, Tamworth Base Hospital, Tamworth, New South Wales, Australia
| | - Claire M Vajdic
- Kirby Institute, Faculty of Medicine, University of New South Wales, Sydney, New South Wales, Australia
| | - Matthew Field
- South Western Sydney Clinical Campus, School of Clinical Medicine, University of New South Wales, Sydney, New South Wales, Australia.,South Western Sydney Cancer Services, NSW Health, Sydney, Sydney, New South Wales, Australia.,Ingham Institute for Applied Medical Research, Sydney, New South Wales, Australia
| |
Collapse
|
11
|
Khosravi B, Rouzrokh P, Erickson BJ. Getting More Out of Large Databases and EHRs with Natural Language Processing and Artificial Intelligence: The Future Is Here. J Bone Joint Surg Am 2022; 104:51-55. [PMID: 36260045 DOI: 10.2106/jbjs.22.00567] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]
Abstract
Electronic health records (EHRs) have created great opportunities to collect various information from clinical patient encounters. However, most EHR data are stored in unstructured form (e.g., clinical notes, surgical notes, and medication instructions), and researchers need data to be in computable form (structured) to extract meaningful relationships involving variables that can influence patient outcomes. Clinical natural language processing (NLP) is the field of extracting structured data from unstructured text documents in EHRs. Clinical text has several characteristics that mandate the use of special techniques to extract structured information from them compared with generic NLP methods. In this article, we define clinical NLP models, introduce different methods of information extraction from unstructured data using NLP, and describe the basic technical aspects of how deep learning-based NLP models work. We conclude by noting the challenges of working with clinical NLP models and summarizing the general steps needed to launch an NLP project.
Collapse
Affiliation(s)
- Bardia Khosravi
- Radiology Informatics Lab (RIL), Department of Radiology, Mayo Clinic, Rochester, Minnesota.,Orthopedic Surgery Artificial Intelligence Laboratory (OSAIL), Department of Orthopedic Surgery, Mayo Clinic, Rochester, Minnesota
| | - Pouria Rouzrokh
- Radiology Informatics Lab (RIL), Department of Radiology, Mayo Clinic, Rochester, Minnesota.,Orthopedic Surgery Artificial Intelligence Laboratory (OSAIL), Department of Orthopedic Surgery, Mayo Clinic, Rochester, Minnesota
| | - Bradley J Erickson
- Radiology Informatics Lab (RIL), Department of Radiology, Mayo Clinic, Rochester, Minnesota
| |
Collapse
|
12
|
Zeng J, Gensheimer MF, Rubin DL, Athey S, Shachter RD. Uncovering interpretable potential confounders in electronic medical records. Nat Commun 2022; 13:1014. [PMID: 35197467 PMCID: PMC8866497 DOI: 10.1038/s41467-022-28546-8] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2021] [Accepted: 01/28/2022] [Indexed: 12/25/2022] Open
Abstract
Randomized clinical trials (RCT) are the gold standard for informing treatment decisions. Observational studies are often plagued by selection bias, and expert-selected covariates may insufficiently adjust for confounding. We explore how unstructured clinical text can be used to reduce selection bias and improve medical practice. We develop a framework based on natural language processing to uncover interpretable potential confounders from text. We validate our method by comparing the estimated hazard ratio (HR) with and without the confounders against established RCTs. We apply our method to four cohorts built from localized prostate and lung cancer datasets from the Stanford Cancer Institute and show that our method shifts the HR estimate towards the RCT results. The uncovered terms can also be interpreted by oncologists for clinical insights. We present this proof-of-concept study to enable more credible causal inference using observational data, uncover meaningful insights from clinical text, and inform high-stakes medical decisions. Randomized clinical trials are often plagued by selection bias, and expert-selected covariates may insufficiently adjust for confounding factors. Here, the authors develop a framework based on natural language processing to uncover interpretable potential confounders from text.
Collapse
Affiliation(s)
- Jiaming Zeng
- Department of Management Science and Engineering, Stanford University, Stanford, CA, 94305, USA.
| | - Michael F Gensheimer
- Department of Radiation Oncology, Stanford School of Medicine, Stanford, CA, 94305, USA
| | - Daniel L Rubin
- Department of Biomedical Data Science, Radiology, and Medicine, Stanford School of Medicine, Stanford, CA, 94305, USA
| | - Susan Athey
- Graduate School of Business, Stanford University, Stanford, CA, 94305, USA
| | - Ross D Shachter
- Department of Management Science and Engineering, Stanford University, Stanford, CA, 94305, USA
| |
Collapse
|
13
|
Su D, Li Q, Zhang T, Veliz P, Chen Y, He K, Mahajan P, Zhang X. Prediction of acute appendicitis among patients with undifferentiated abdominal pain at emergency department. BMC Med Res Methodol 2022; 22:18. [PMID: 35026994 PMCID: PMC8759254 DOI: 10.1186/s12874-021-01490-9] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2020] [Accepted: 12/08/2021] [Indexed: 11/12/2022] Open
Abstract
Background Early screening and accurately identifying Acute Appendicitis (AA) among patients with undifferentiated symptoms associated with appendicitis during their emergency visit will improve patient safety and health care quality. The aim of the study was to compare models that predict AA among patients with undifferentiated symptoms at emergency visits using both structured data and free-text data from a national survey. Methods We performed a secondary data analysis on the 2005-2017 United States National Hospital Ambulatory Medical Care Survey (NHAMCS) data to estimate the association between emergency department (ED) patients with the diagnosis of AA, and the demographic and clinical factors present at ED visits during a patient’s ED stay. We used binary logistic regression (LR) and random forest (RF) models incorporating natural language processing (NLP) to predict AA diagnosis among patients with undifferentiated symptoms. Results Among the 40,441 ED patients with assigned International Classification of Diseases (ICD) codes of AA and appendicitis-related symptoms between 2005 and 2017, 655 adults (2.3%) and 256 children (2.2%) had AA. For the LR model identifying AA diagnosis among adult ED patients, the c-statistic was 0.72 (95% CI: 0.69–0.75) for structured variables only, 0.72 (95% CI: 0.69–0.75) for unstructured variables only, and 0.78 (95% CI: 0.76–0.80) when including both structured and unstructured variables. For the LR model identifying AA diagnosis among pediatric ED patients, the c-statistic was 0.84 (95% CI: 0.79–0.89) for including structured variables only, 0.78 (95% CI: 0.72–0.84) for unstructured variables, and 0.87 (95% CI: 0.83–0.91) when including both structured and unstructured variables. The RF method showed similar c-statistic to the corresponding LR model. Conclusions We developed predictive models that can predict the AA diagnosis for adult and pediatric ED patients, and the predictive accuracy was improved with the inclusion of NLP elements and approaches. Supplementary Information The online version contains supplementary material available at 10.1186/s12874-021-01490-9.
Collapse
Affiliation(s)
- Dai Su
- Department of Health Management and Policy, School of Public Health, Capital Medical University, Beijing, China
| | - Qinmengge Li
- Department of Systems, Populations, and Leadership, University of Michigan School of Nursing, Ann Arbor, USA.,Department of Biostatistics, University of Michigan School of Public Health, Ann Arbor, USA
| | - Tao Zhang
- Department of Epidemiology and Biostatistics, West China School of Public Health School, Sichuan University, Chengdu, China
| | - Philip Veliz
- Department of Systems, Populations, and Leadership, University of Michigan School of Nursing, Ann Arbor, USA
| | - Yingchun Chen
- Department of Health Management, School of Medicine and Health Management, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China.,Research Center for Rural Health Services, Hubei Province Key Research Institute of Humanities and Social Sciences, Wuhan, China
| | - Kevin He
- Department of Biostatistics, University of Michigan School of Public Health, Ann Arbor, USA
| | - Prashant Mahajan
- Department of Emergency Medicine, University of Michigan School of Medicine, Ann Arbor, USA
| | - Xingyu Zhang
- Thomas E. Starzl Transplantation Institute, University of Pittsburgh Medical Center, Pittsburgh, USA.
| |
Collapse
|