1
|
Bilal M, Hamza A, Malik N. NLP for Analyzing Electronic Health Records and Clinical Notes in Cancer Research: A Review. J Pain Symptom Manage 2025; 69:e374-e394. [PMID: 39894080 DOI: 10.1016/j.jpainsymman.2025.01.019] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/31/2024] [Revised: 12/31/2024] [Accepted: 01/20/2025] [Indexed: 02/04/2025]
Abstract
This review examines the application of natural language processing (NLP) techniques in cancer research using electronic health records (EHRs) and clinical notes. It addresses gaps in existing literature by providing a broader perspective than previous studies focused on specific cancer types or applications. A comprehensive literature search in the Scopus database identified 94 relevant studies published between 2019 and 2024. The analysis revealed a growing trend in NLP applications for cancer research, with information extraction (47 studies) and text classification (40 studies) emerging as predominant NLP tasks, followed by named entity recognition (7 studies). Among cancer types, breast, lung, and colorectal cancers were found to be the most studied. A significant shift from rule-based and traditional machine learning approaches to advanced deep learning techniques and transformer-based models was observed. It was found that dataset sizes used in existing studies varied widely, ranging from small, manually annotated datasets to large-scale EHRs. The review highlighted key challenges, including the limited generalizability of proposed solutions and the need for improved integration into clinical workflows. While NLP techniques show significant potential in analyzing EHRs and clinical notes for cancer research, future work should focus on improving model generalizability, enhancing robustness in handling complex clinical language, and expanding applications to understudied cancer types. The integration of NLP tools into palliative medicine and addressing ethical considerations remain crucial for utilizing the full potential of NLP in enhancing cancer diagnosis, treatment, and patient outcomes. This review provides valuable insights into the current state and future directions of NLP applications in cancer research.
Collapse
Affiliation(s)
- Muhammad Bilal
- Department of Pharmaceutical Outcomes and Policy (M.B.), University of Florida, Gainesville, Florida, USA; Department of Software Engineering (M.B.), National University of Computer and Emerging Sciences, Islamabad, Pakistan.
| | - Ameer Hamza
- Department of Computer Science (A.H.), Faculty of Computing and IT, University of Sargodha, Sargodha, Punjab, Pakistan
| | - Nadia Malik
- Department of Software Engineering (N.M.), Faculty of Computing and IT, University of Sargodha, Sargodha, Punjab, Pakistan
| |
Collapse
|
2
|
Shen Y, Yu J, Zhou J, Hu G. Twenty-Five Years of Evolution and Hurdles in Electronic Health Records and Interoperability in Medical Research: Comprehensive Review. J Med Internet Res 2025; 27:e59024. [PMID: 39787599 PMCID: PMC11757985 DOI: 10.2196/59024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2024] [Revised: 10/02/2024] [Accepted: 12/05/2024] [Indexed: 01/12/2025] Open
Abstract
BACKGROUND Electronic health records (EHRs) facilitate the accessibility and sharing of patient data among various health care providers, contributing to more coordinated and efficient care. OBJECTIVE This study aimed to summarize the evolution of secondary use of EHRs and their interoperability in medical research over the past 25 years. METHODS We conducted an extensive literature search in the PubMed, Scopus, and Web of Science databases using the keywords Electronic health record and Electronic medical record in the title or abstract and Medical research in all fields from 2000 to 2024. Specific terms were applied to different time periods. RESULTS The review yielded 2212 studies, all of which were then screened and processed in a structured manner. Of these 2212 studies, 2102 (93.03%) were included in the review analysis, of which 1079 (51.33%) studies were from 2000 to 2009, 582 (27.69%) were from 2010 to 2019, 251 (11.94%) were from 2020 to 2023, and 190 (9.04%) were from 2024. CONCLUSIONS The evolution of EHRs marks an important milestone in health care's journey toward integrating technology and medicine. From early documentation practices to the sophisticated use of artificial intelligence and big data analytics today, EHRs have become central to improving patient care, enhancing public health surveillance, and advancing medical research.
Collapse
Affiliation(s)
- Yun Shen
- Chronic Disease Epidemiology, Population and Public Health, Pennington Biomedical Research Center, Baton Rouge, LA, United States
| | - Jiamin Yu
- Department of Endocrinology and Metabolism, Shanghai Sixth People's Hospital Affiliated to Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Jian Zhou
- Department of Endocrinology and Metabolism, Shanghai Sixth People's Hospital Affiliated to Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Gang Hu
- Chronic Disease Epidemiology, Population and Public Health, Pennington Biomedical Research Center, Baton Rouge, LA, United States
| |
Collapse
|
3
|
Wiernik A, Rogado A, O'Mahony D, Abdul Razak AR. Elevating Cancer Care Standards Worldwide: An Analysis of Global Initiatives and Progress. JCO Glob Oncol 2024; 10:e2400199. [PMID: 39705636 DOI: 10.1200/go.24.00199] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2024] [Revised: 09/24/2024] [Accepted: 11/11/2024] [Indexed: 12/22/2024] Open
Abstract
Cancer remains a widespread and significant global health issue, with consequential impacts on individuals, families, and societies across the globe. Although there have been noteworthy advancements in the prevention, diagnosis, treatment, and study of cancer, the impact of this disease continues to be significant on health care systems and people worldwide. Furthermore, there are still differences in obtaining the advantages of modern cancer treatment, which can partly be attributed to the lack of standardized standards for providing top-notch cancer care. To tackle these difficulties, a multitude of projects and organizations have emerged to address the standard of cancer care on a global level. This paper provides a comprehensive review and analysis of the worldwide influence of programs and organizations that seek to improve the quality of cancer care. This document examines the progression of these initiatives, their cooperation with international organizations, possible paths for additional advancement, and suggestions for enhancing the standard of cancer treatment worldwide.
Collapse
Affiliation(s)
- Andres Wiernik
- Cancer and Hematology Center, Metropolitano Hospital, San Jose, Costa Rica
| | - Alvaro Rogado
- Fundacion Excelencia y Calidad Oncología (ECO), Madrid, Spain
| | - Deirdre O'Mahony
- Department of Medical Oncology, Bons Secours Hospital, Cork, Ireland
| | | |
Collapse
|
4
|
Kehl KL, Jee J, Pichotta K, Paul MA, Trukhanov P, Fong C, Waters M, Bakouny Z, Xu W, Choueiri TK, Nichols C, Schrag D, Schultz N. Shareable artificial intelligence to extract cancer outcomes from electronic health records for precision oncology research. Nat Commun 2024; 15:9787. [PMID: 39532885 PMCID: PMC11557593 DOI: 10.1038/s41467-024-54071-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2024] [Accepted: 10/31/2024] [Indexed: 11/16/2024] Open
Abstract
Databases that link molecular data to clinical outcomes can inform precision cancer research into novel prognostic and predictive biomarkers. However, outside of clinical trials, cancer outcomes are typically recorded only in text form within electronic health records (EHRs). Artificial intelligence (AI) models have been trained to extract outcomes from individual EHRs. However, patient privacy restrictions have historically precluded dissemination of these models beyond the centers at which they were trained. In this study, the vulnerability of text classification models trained directly on protected health information to membership inference attacks is confirmed. A teacher-student distillation approach is applied to develop shareable models for annotating outcomes from imaging reports and medical oncologist notes. 'Teacher' models trained on EHR data from Dana-Farber Cancer Institute (DFCI) are used to label imaging reports and discharge summaries from the Medical Information Mart for Intensive Care (MIMIC)-IV dataset. 'Student' models are trained to use these MIMIC documents to predict the labels assigned by teacher models and sent to Memorial Sloan Kettering (MSK) for evaluation. The student models exhibit high discrimination across outcomes in both the DFCI and MSK test sets. Leveraging private labeling of public datasets to distill publishable clinical AI models from academic centers could facilitate deployment of machine learning to accelerate precision oncology research.
Collapse
Affiliation(s)
- Kenneth L Kehl
- Dana-Farber Cancer Institute, 450 Brookline Ave, Boston, MA, USA.
| | - Justin Jee
- Memorial Sloan Kettering Cancer Center, 1275 York Ave, New York, NY, USA
| | - Karl Pichotta
- Memorial Sloan Kettering Cancer Center, 1275 York Ave, New York, NY, USA
| | - Morgan A Paul
- Dana-Farber Cancer Institute, 450 Brookline Ave, Boston, MA, USA
| | - Pavel Trukhanov
- Dana-Farber Cancer Institute, 450 Brookline Ave, Boston, MA, USA
| | - Christopher Fong
- Memorial Sloan Kettering Cancer Center, 1275 York Ave, New York, NY, USA
| | - Michele Waters
- Memorial Sloan Kettering Cancer Center, 1275 York Ave, New York, NY, USA
| | - Ziad Bakouny
- Memorial Sloan Kettering Cancer Center, 1275 York Ave, New York, NY, USA
| | - Wenxin Xu
- Dana-Farber Cancer Institute, 450 Brookline Ave, Boston, MA, USA
| | - Toni K Choueiri
- Dana-Farber Cancer Institute, 450 Brookline Ave, Boston, MA, USA
| | - Chelsea Nichols
- Memorial Sloan Kettering Cancer Center, 1275 York Ave, New York, NY, USA
| | - Deborah Schrag
- Memorial Sloan Kettering Cancer Center, 1275 York Ave, New York, NY, USA
| | - Nikolaus Schultz
- Memorial Sloan Kettering Cancer Center, 1275 York Ave, New York, NY, USA
| |
Collapse
|
5
|
Fadul CE, Sheehan JP, Silvestre J, Bonilla G, Bovi JA, Ahluwalia M, Soffietti R, Hui D, Anderson RT. Defining the quality of interdisciplinary care for patients with brain metastases: modified Delphi panel recommendations. Lancet Oncol 2024; 25:e432-e440. [PMID: 39214114 DOI: 10.1016/s1470-2045(24)00198-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2024] [Revised: 03/21/2024] [Accepted: 04/04/2024] [Indexed: 09/04/2024]
Abstract
The value of interdisciplinary teams in improving outcomes and quality of care of patients with brain metastases remains uncertain, partly due to the lack of consensus on key indicators to evaluate interprofessional care. We aimed to obtain expert consensus across disciplines on indicators that evaluate the quality and value of brain metastases care. A steering committee of key opinion leaders curated relevant outcomes and process indicators from a literature review and a stakeholder needs assessment, and an international panel of physicians rated the outcomes and process indicators using a modified Delphi method. After three rounds, a consensus was reached on 29 indicators encompassing brain-directed oncological treatment, surgery, whole-brain radiotherapy, stereotactic radiosurgery, supportive or palliative care, and interdisciplinary team care. The Brain Metastases Quality-of-Care measure reflects the value and quality of brain metastases team-based care according to treatment modality and provides a benchmark of care for this under-studied patient population. The adoption, implementation, and sustainability of this set of indicators could help address the need expressed by patients with cancer, caregivers, and clinicians for more coordinated care across inpatient, outpatient, home, community, and tertiary academic settings.
Collapse
Affiliation(s)
- Camilo E Fadul
- Department of Neurology, Division of Neuro-Oncology, University of Virginia School of Medicine, Charlottesville, VA, USA.
| | - Jason P Sheehan
- Department of Neurological Surgery, University of Virginia School of Medicine, Charlottesville, VA, USA
| | - Julio Silvestre
- Department of Palliative Care, Loma Linda University School of Medicine, Loma Linda, CA, USA
| | - Gloribel Bonilla
- University of Virginia Comprehensive Cancer Center, Charlottesville, VA, USA
| | - Joseph A Bovi
- Department of Radiation Oncology, Medical College of Wisconsin, Milwaukee, WI, USA
| | - Manmeet Ahluwalia
- Miami Cancer Institute, Baptist Health South Florida, Miami, FL, USA
| | - Riccardo Soffietti
- Department of Neuroscience, Division of Neuro-Oncology, University of Turin and City of Health and Science University Hospital, Turin, Italy
| | - David Hui
- Department of Palliative Care, Rehabilitation and Integrative Medicine, University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Roger T Anderson
- University of Virginia Comprehensive Cancer Center, Charlottesville, VA, USA; Department of Public Health, University of Virginia School of Medicine, Charlottesville, VA, USA
| |
Collapse
|
6
|
Goryachev SD, Yildirim C, DuMontier C, La J, Dharne M, Gaziano JM, Brophy MT, Munshi NC, Driver JA, Do NV, Fillmore NR. Natural Language Processing Algorithm to Extract Multiple Myeloma Stage From Oncology Notes in the Veterans Affairs Healthcare System. JCO Clin Cancer Inform 2024; 8:e2300197. [PMID: 39038255 PMCID: PMC11371094 DOI: 10.1200/cci.23.00197] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2023] [Revised: 03/14/2024] [Accepted: 05/06/2024] [Indexed: 07/24/2024] Open
Abstract
PURPOSE Stage in multiple myeloma (MM) is an essential measure of disease risk, but its measurement in large databases is often lacking. We aimed to develop and validate a natural language processing (NLP) algorithm to extract oncologists' documentation of stage in the national Veterans Affairs (VA) Healthcare System. METHODS Using nationwide electronic health record (EHR) and cancer registry data from the VA Corporate Data Warehouse, we developed and validated a rule-based NLP algorithm to extract oncologist-determined MM stage. To that end, a clinician annotated MM stage within over 5,000 short snippets of clinical notes, and annotated MM stage at MM treatment initiation for 200 patients. These were allocated into snippet- and patient-level development and validation sets. We developed MM stage extraction and roll-up algorithms within the development sets. After the algorithms were finalized, we validated them using standard measures in held-out validation sets. RESULTS We developed algorithms for three different MM staging systems that have been in widespread use (Revised International Staging System [R-ISS], International Staging System [ISS], and Durie-Salmon [DS]) and for stage reported without a clearly defined system. Precision and recall were uniformly high for MM stage at the snippet level, ranging from 0.92 to 0.99 for the different MM staging systems. Performance in identifying for MM stage at treatment initiation at the patient level was also excellent, with precision of 0.92, 0.96, 0.90, and 0.86 and recall of 0.99, 0.98, 0.94, and 0.92 for R-ISS, ISS, DS, and unclear stage, respectively. CONCLUSION Our MM stage extraction algorithm uses rule-based NLP and data aggregation to accurately measure MM stage documented in oncology notes and pathology reports in VA's national EHR system. It may be adapted to other systems where MM stage is recorded in clinical notes.
Collapse
Affiliation(s)
- Sergey D. Goryachev
- Massachusetts Veterans Epidemiology Research and Information Center (MAVERIC), Boston, MA
- VA Boston Healthcare System, Boston, MA
- VA Boston Cooperative Studies Program, Boston, MA
| | - Cenk Yildirim
- Massachusetts Veterans Epidemiology Research and Information Center (MAVERIC), Boston, MA
- VA Boston Healthcare System, Boston, MA
- VA Boston Cooperative Studies Program, Boston, MA
| | - Clark DuMontier
- New England Geriatrics Research, Education and Clinical Center, VA Boston Healthcare System, Boston, MA
- Division of Aging, Brigham and Women's Hospital, Boston, MA
- Divison of Population Sciences, Dana-Farber Cancer Institute, Boston, MA
- Harvard Medical School, Boston, MA
| | - Jennifer La
- Massachusetts Veterans Epidemiology Research and Information Center (MAVERIC), Boston, MA
- VA Boston Healthcare System, Boston, MA
- Harvard Medical School, Boston, MA
| | | | - J. Michael Gaziano
- Massachusetts Veterans Epidemiology Research and Information Center (MAVERIC), Boston, MA
- VA Boston Healthcare System, Boston, MA
- Division of Aging, Brigham and Women's Hospital, Boston, MA
- Harvard Medical School, Boston, MA
| | - Mary T. Brophy
- Massachusetts Veterans Epidemiology Research and Information Center (MAVERIC), Boston, MA
- VA Boston Healthcare System, Boston, MA
- VA Boston Cooperative Studies Program, Boston, MA
- Boston University School of Medicine, Boston, MA
| | - Nikhil C. Munshi
- VA Boston Healthcare System, Boston, MA
- Harvard Medical School, Boston, MA
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA
| | - Jane A. Driver
- New England Geriatrics Research, Education and Clinical Center, VA Boston Healthcare System, Boston, MA
- Division of Aging, Brigham and Women's Hospital, Boston, MA
- Harvard Medical School, Boston, MA
| | - Nhan V. Do
- Massachusetts Veterans Epidemiology Research and Information Center (MAVERIC), Boston, MA
- VA Boston Healthcare System, Boston, MA
- VA Boston Cooperative Studies Program, Boston, MA
- Boston University School of Medicine, Boston, MA
| | - Nathanael R. Fillmore
- Massachusetts Veterans Epidemiology Research and Information Center (MAVERIC), Boston, MA
- VA Boston Healthcare System, Boston, MA
- Harvard Medical School, Boston, MA
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA
| |
Collapse
|
7
|
Jabeen S, Rahman M, Siddique AB, Hasan M, Matin R, Rahman QSU, AKM TH, Alim A, Nadia N, Mahmud M, Islam J, Islam MS, Haider MS, Dewan F, Begum F, Barua U, Anam MT, Islam A, Razzak KSB, Ameen S, Hossain AT, Nahar Q, Ahmed A, El Arifeen S, Rahman AE. Introducing a digital emergency obstetric and newborn care register for indoor obstetric patient management: An implementation research in selected public health care facilities of Bangladesh. J Glob Health 2024; 14:04075. [PMID: 38722093 PMCID: PMC11082830 DOI: 10.7189/jogh.14.04075] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/12/2024] Open
Abstract
Background Digital health records have emerged as vital tools for improving health care delivery and patient data management. Acknowledging the gaps in data recording by a paper-based register, the emergency obstetric and newborn care (EmONC) register used in the labour ward was digitised. In this study, we aimed to assess the implementation outcome of the digital register in selected public health care facilities in Bangladesh. Methods Extensive collaboration with stakeholders facilitated the development of an android-based electronic register from the paper-based register in the labour rooms of the selected district and sub-district level public health facilities of Bangladesh. We conducted a study to assess the implementation outcome of introducing the digital EmONC register in the labour ward. Results The digital register demonstrated high usability with a score of 83.7 according to the system usability scale, and health care providers found it highly acceptable, with an average score exceeding 95% using the technology acceptance model. The adoption rate reached an impressive 98% (95% confidence interval (CI) = 98-99), and fidelity stood at 90% (95% CI = 88-91) in the digital register, encompassing more than 80% of data elements. Notably, fidelity increased significantly over the implementation period of six months. The digital system proved a high utility rate of 89% (95% CI = 88-91), and all outcome variables exceeded the predefined benchmark. Conclusions The implementation outcome assessment underscores the potential of the digital register to enhance maternal and newborn health care in Bangladesh. Its user-friendliness, improved data completeness, and high adoption rates indicate its capacity to streamline health care data management and improve the quality of care.
Collapse
Affiliation(s)
- Sabrina Jabeen
- International Centre for Diarrhoeal Disease Research, Dhaka, Bangladesh
| | - Mahiur Rahman
- International Centre for Diarrhoeal Disease Research, Dhaka, Bangladesh
| | | | - Mehedi Hasan
- International Centre for Diarrhoeal Disease Research, Dhaka, Bangladesh
| | - Rubaiya Matin
- International Centre for Diarrhoeal Disease Research, Dhaka, Bangladesh
| | | | | | - Azizul Alim
- Directorate General of Health Services, Ministry of Health and Family Welfare, Government of the People’s Republic of Bangladesh, Dhaka, Bangladesh
| | - Nuzhat Nadia
- Directorate General of Health Services, Ministry of Health and Family Welfare, Government of the People’s Republic of Bangladesh, Dhaka, Bangladesh
| | - Mustufa Mahmud
- Directorate General of Health Services, Ministry of Health and Family Welfare, Government of the People’s Republic of Bangladesh, Dhaka, Bangladesh
| | - Jahurul Islam
- Directorate General of Health Services, Ministry of Health and Family Welfare, Government of the People’s Republic of Bangladesh, Dhaka, Bangladesh
| | - Muhammad Shariful Islam
- Directorate General of Health Services, Ministry of Health and Family Welfare, Government of the People’s Republic of Bangladesh, Dhaka, Bangladesh
| | - Mohammad Sabbir Haider
- Directorate General of Health Services, Ministry of Health and Family Welfare, Government of the People’s Republic of Bangladesh, Dhaka, Bangladesh
| | - Farhana Dewan
- Obstetrical and Gynaecological Society of Bangladesh, Dhaka, Bangladesh
| | - Ferdousi Begum
- Obstetrical and Gynaecological Society of Bangladesh, Dhaka, Bangladesh
| | - Uchchash Barua
- International Centre for Diarrhoeal Disease Research, Dhaka, Bangladesh
| | | | - Abirul Islam
- International Centre for Diarrhoeal Disease Research, Dhaka, Bangladesh
| | | | - Shafiqul Ameen
- International Centre for Diarrhoeal Disease Research, Dhaka, Bangladesh
| | | | - Quamrun Nahar
- International Centre for Diarrhoeal Disease Research, Dhaka, Bangladesh
| | - Anisuddin Ahmed
- International Centre for Diarrhoeal Disease Research, Dhaka, Bangladesh
| | - Shams El Arifeen
- International Centre for Diarrhoeal Disease Research, Dhaka, Bangladesh
| | | |
Collapse
|
8
|
Pillai M, Posada J, Gardner RM, Hernandez-Boussard T, Bannett Y. Measuring quality-of-care in treatment of young children with attention-deficit/hyperactivity disorder using pre-trained language models. J Am Med Inform Assoc 2024; 31:949-957. [PMID: 38244997 PMCID: PMC10990536 DOI: 10.1093/jamia/ocae001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2023] [Revised: 12/07/2023] [Accepted: 01/03/2024] [Indexed: 01/22/2024] Open
Abstract
OBJECTIVE To measure pediatrician adherence to evidence-based guidelines in the treatment of young children with attention-deficit/hyperactivity disorder (ADHD) in a diverse healthcare system using natural language processing (NLP) techniques. MATERIALS AND METHODS We extracted structured and free-text data from electronic health records (EHRs) of all office visits (2015-2019) of children aged 4-6 years in a community-based primary healthcare network in California, who had ≥1 visits with an ICD-10 diagnosis of ADHD. Two pediatricians annotated clinical notes of the first ADHD visit for 423 patients. Inter-annotator agreement (IAA) was assessed for the recommendation for the first-line behavioral treatment (F-measure = 0.89). Four pre-trained language models, including BioClinical Bidirectional Encoder Representations from Transformers (BioClinicalBERT), were used to identify behavioral treatment recommendations using a 70/30 train/test split. For temporal validation, we deployed BioClinicalBERT on 1,020 unannotated notes from other ADHD visits and well-care visits; all positively classified notes (n = 53) and 5% of negatively classified notes (n = 50) were manually reviewed. RESULTS Of 423 patients, 313 (74%) were male; 298 (70%) were privately insured; 138 (33%) were White; 61 (14%) were Hispanic. The BioClinicalBERT model trained on the first ADHD visits achieved F1 = 0.76, precision = 0.81, recall = 0.72, and AUC = 0.81 [0.72-0.89]. Temporal validation achieved F1 = 0.77, precision = 0.68, and recall = 0.88. Fairness analysis revealed low model performance in publicly insured patients (F1 = 0.53). CONCLUSION Deploying pre-trained language models on a variable set of clinical notes accurately captured pediatrician adherence to guidelines in the treatment of children with ADHD. Validating this approach in other patient populations is needed to achieve equitable measurement of quality of care at scale and improve clinical care for mental health conditions.
Collapse
Affiliation(s)
- Malvika Pillai
- Department of Biomedical Data Science, Stanford University School of Medicine, Stanford, CA 94305, United States
| | - Jose Posada
- Computer Science Department, University of the North, Barranquilla 080020, Colombia
| | - Rebecca M Gardner
- Department of Epidemiology and Population Health, Stanford University School of Medicine, Stanford, CA 94305, United States
| | - Tina Hernandez-Boussard
- Department of Biomedical Data Science, Stanford University School of Medicine, Stanford, CA 94305, United States
| | - Yair Bannett
- Division of Developmental-Behavioral Pediatrics, Department of Pediatrics, Stanford University School of Medicine, Stanford, CA 94304, United States
| |
Collapse
|
9
|
Bazoge A, Morin E, Daille B, Gourraud PA. Applying Natural Language Processing to Textual Data From Clinical Data Warehouses: Systematic Review. JMIR Med Inform 2023; 11:e42477. [PMID: 38100200 PMCID: PMC10757232 DOI: 10.2196/42477] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2022] [Revised: 01/16/2023] [Accepted: 09/07/2023] [Indexed: 12/17/2023] Open
Abstract
BACKGROUND In recent years, health data collected during the clinical care process have been often repurposed for secondary use through clinical data warehouses (CDWs), which interconnect disparate data from different sources. A large amount of information of high clinical value is stored in unstructured text format. Natural language processing (NLP), which implements algorithms that can operate on massive unstructured textual data, has the potential to structure the data and make clinical information more accessible. OBJECTIVE The aim of this review was to provide an overview of studies applying NLP to textual data from CDWs. It focuses on identifying the (1) NLP tasks applied to data from CDWs and (2) NLP methods used to tackle these tasks. METHODS This review was performed according to the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines. We searched for relevant articles in 3 bibliographic databases: PubMed, Google Scholar, and ACL Anthology. We reviewed the titles and abstracts and included articles according to the following inclusion criteria: (1) focus on NLP applied to textual data from CDWs, (2) articles published between 1995 and 2021, and (3) written in English. RESULTS We identified 1353 articles, of which 194 (14.34%) met the inclusion criteria. Among all identified NLP tasks in the included papers, information extraction from clinical text (112/194, 57.7%) and the identification of patients (51/194, 26.3%) were the most frequent tasks. To address the various tasks, symbolic methods were the most common NLP methods (124/232, 53.4%), showing that some tasks can be partially achieved with classical NLP techniques, such as regular expressions or pattern matching that exploit specialized lexica, such as drug lists and terminologies. Machine learning (70/232, 30.2%) and deep learning (38/232, 16.4%) have been increasingly used in recent years, including the most recent approaches based on transformers. NLP methods were mostly applied to English language data (153/194, 78.9%). CONCLUSIONS CDWs are central to the secondary use of clinical texts for research purposes. Although the use of NLP on data from CDWs is growing, there remain challenges in this field, especially with regard to languages other than English. Clinical NLP is an effective strategy for accessing, extracting, and transforming data from CDWs. Information retrieved with NLP can assist in clinical research and have an impact on clinical practice.
Collapse
Affiliation(s)
- Adrien Bazoge
- Nantes Université, École Centrale Nantes, CNRS, LS2N, UMR 6004, F-44000 Nantes, France
- Nantes Université, CHU de Nantes, Pôle Hospitalo-Universitaire 11: Santé Publique, Clinique des données, INSERM, CIC 1413, F-44000 Nantes, France
| | - Emmanuel Morin
- Nantes Université, École Centrale Nantes, CNRS, LS2N, UMR 6004, F-44000 Nantes, France
| | - Béatrice Daille
- Nantes Université, École Centrale Nantes, CNRS, LS2N, UMR 6004, F-44000 Nantes, France
| | - Pierre-Antoine Gourraud
- Nantes Université, CHU de Nantes, Pôle Hospitalo-Universitaire 11: Santé Publique, Clinique des données, INSERM, CIC 1413, F-44000 Nantes, France
- Nantes Université, INSERM, CHU de Nantes, École Centrale Nantes, Centre de Recherche Translationnelle en Transplantation et Immunologie, CR2TI, F-44000 Nantes, France
| |
Collapse
|
10
|
Petch J, Kempainnen J, Pettengell C, Aviv S, Butler B, Pond G, Saha A, Bogach J, Allard-Coutu A, Sztur P, Ranisau J, Levine M. Developing a Data and Analytics Platform to Enable a Breast Cancer Learning Health System at a Regional Cancer Center. JCO Clin Cancer Inform 2023; 7:e2200182. [PMID: 37001040 PMCID: PMC10281330 DOI: 10.1200/cci.22.00182] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2022] [Accepted: 02/10/2023] [Indexed: 04/03/2023] Open
Abstract
PURPOSE This study documents the creation of automated, longitudinal, and prospective data and analytics platform for breast cancer at a regional cancer center. This platform combines principles of data warehousing with natural language processing (NLP) to provide the integrated, timely, meaningful, high-quality, and actionable data required to establish a learning health system. METHODS Data from six hospital information systems and one external data source were integrated on a nightly basis by automated extract/transform/load jobs. Free-text clinical documentation was processed using a commercial NLP engine. RESULTS The platform contains 141 data elements of 7,019 patients with newly diagnosed breast cancer who received care at our regional cancer center from January 1, 2014, to June 3, 2022. Daily updating of the database takes an average of 56 minutes. Evaluation of the tuning of NLP jobs found overall high performance, with an F1 of 1.0 for 19 variables, with a further 16 variables with an F1 of > 0.95. CONCLUSION This study describes how data warehousing combined with NLP can be used to create a prospective data and analytics platform to enable a learning health system. Although upfront time investment required to create the platform was considerable, now that it has been developed, daily data processing is completed automatically in less than an hour.
Collapse
Affiliation(s)
- Jeremy Petch
- Centre for Data Science and Digital Health, Hamilton Health Sciences, Hamilton, Canada
- Institute for Health Policy Management and Evaluation, Dalla Lana School of Public Health, University of Toronto, Toronto, Canada
- Division of Cardiology, Department of Medicine, Faculty of Health Sciences, McMaster University, Hamilton, Canada
- Population Health Research Institute, Hamilton Health Sciences, Hamilton, Canada
| | - Joel Kempainnen
- Centre for Data Science and Digital Health, Hamilton Health Sciences, Hamilton, Canada
| | | | | | | | - Greg Pond
- Escarpment Cancer Research Institute, Hamilton Health Sciences, Hamilton, Canada
| | - Ashirbani Saha
- Centre for Data Science and Digital Health, Hamilton Health Sciences, Hamilton, Canada
- Escarpment Cancer Research Institute, Hamilton Health Sciences, Hamilton, Canada
- Department of Oncology, Faculty of Health Sciences, McMaster University, Hamilton, Canada
| | - Jessica Bogach
- Department of Surgery, Faculty of Health Sciences, McMaster University, Hamilton, Canada
| | | | - Peter Sztur
- Centre for Data Science and Digital Health, Hamilton Health Sciences, Hamilton, Canada
| | - Jonathan Ranisau
- Centre for Data Science and Digital Health, Hamilton Health Sciences, Hamilton, Canada
| | - Mark Levine
- Hamilton Health Sciences, Hamilton, Canada
- Escarpment Cancer Research Institute, Hamilton Health Sciences, Hamilton, Canada
| |
Collapse
|
11
|
Ferrara L, Otto M, Aapro M, Albreht T, Jonsson B, Oberst S, Oliver K, Pisani E, Presti P, Rubio IT, Terkola R, Tarricone R. How to improve efficiency in cancer care: dimensions, methods, and areas of evaluation. J Cancer Policy 2022; 34:100355. [PMID: 36007873 DOI: 10.1016/j.jcpo.2022.100355] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2021] [Revised: 08/03/2022] [Accepted: 08/17/2022] [Indexed: 12/01/2022]
Abstract
Efficiency in healthcare is crucial since available resources are scarce, and the opportunity cost of an inefficient allocation is measured in health outcomes foregone. This is particularly relevant for cancer. The aim of this paper was to gain a comprehensive overview of how efficiency in cancer care is defined, and what the indicators, different methods, perspectives, and areas of evaluation are, to provide recommendations on the areas and dimensions where efficiency can be improved. METHODS: A comprehensive scoping literature review was performed searching four databases. Studies published between 2000-2021 were included if they described experiences and cases of efficiency in cancer care or methods to evaluate efficiency. The results of the literature review were then discussed during two rounds of online consultation with a panel of 15 external experts invited to provide their insights and comments to deliberate policy recommendations. RESULTS: 46 papers met the inclusion criteria. Based on the papers retrieved we have identified six areas for achieving efficiency gains throughout the entire care pathway and, for each area of efficiency, we have categorized the methods and outcome used to measure efficiency gain CONCLUSION: This is the first attempt to systematize a scattered body of literature on how to improve efficiency in cancer care and identify key areas to improve it. Based on the findings of the literature review and on the opinion of the experts involved in the consultation, we propose seven recommendations that are intended to improve efficiency in cancer care throughout the care pathway.
Collapse
Affiliation(s)
- Lucia Ferrara
- Cergas SDA Bocconi School of management, via Sarfatti, 11 - 20136 Milano (Italy).
| | - Monica Otto
- Cergas SDA Bocconi School of management, via Sarfatti, 11 - 20136 Milano (Italy).
| | - Matti Aapro
- Genolier Hospital Genolier Cancer Center, SPCC - Sharing Progress in Cancer Care, Route du Muids 3, 1272 Genolier (Switzerland).
| | - Tit Albreht
- Centre for Health Care, National Institute of Public Health, Ljubljana, (Slovenia) iPAAC - Innovative Partnership for Action against Cancer.
| | - Bengt Jonsson
- Department of Economics, Stockholm School of Economics, Stockholm, Sweden.
| | - Simon Oberst
- OECI - Organisation of European Cancer Institutes, rue d'Egmont 11, B-1000 Brussels (Belgium).
| | - Kathy Oliver
- IBTA - International Brain Tumor Alliance, Tadworth, Surrey (United Kingdom).
| | - Eduardo Pisani
- All.Can - All.Can International asbl, Brussels, rue du Luxemburg 22-24, BE-1000 Brussels (Belgium).
| | - Pietro Presti
- SPCC - Sharing Progress in Cancer Care, Piazza Indipendenza 2, 6500 Bellinzona (Switzerland).
| | - Isabel T Rubio
- Clinica Universidad de Navarra, Madrid, ESSO - European Society of Surgical Oncology, Av. de Pío XII, 36, 31008 Pamplona, Navarra (Spain).
| | - Robert Terkola
- University Medical Center Groningen; University of Florida -College of Pharmacy; ESOP - European Society of oncology pharmacy.
| | | |
Collapse
|
12
|
Zhang D, Song J, Dharmarajan S, Jung TH, Lee H, Ma Y, Zhang R, Levenson M. The Use of Machine Learning in Regulatory Drug Safety Evaluation. Stat Biopharm Res 2022. [DOI: 10.1080/19466315.2022.2108135] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/16/2022]
Affiliation(s)
- Di Zhang
- Division of Biometrics VII, Office of Biostatistics, Center for Drug Evaluation and Research, U.S. Food and Drug Administration
| | - Jaejoon Song
- Division of Biometrics VII, Office of Biostatistics, Center for Drug Evaluation and Research, U.S. Food and Drug Administration
| | - Sai Dharmarajan
- Division of Biometrics VII, Office of Biostatistics, Center for Drug Evaluation and Research, U.S. Food and Drug Administration
| | - Tae Hyun Jung
- Division of Biometrics VII, Office of Biostatistics, Center for Drug Evaluation and Research, U.S. Food and Drug Administration
| | - Hana Lee
- Division of Biometrics VII, Office of Biostatistics, Center for Drug Evaluation and Research, U.S. Food and Drug Administration
| | - Yong Ma
- Division of Biometrics VII, Office of Biostatistics, Center for Drug Evaluation and Research, U.S. Food and Drug Administration
| | - Rongmei Zhang
- Division of Biometrics VII, Office of Biostatistics, Center for Drug Evaluation and Research, U.S. Food and Drug Administration
| | - Mark Levenson
- Division of Biometrics VII, Office of Biostatistics, Center for Drug Evaluation and Research, U.S. Food and Drug Administration
| |
Collapse
|
13
|
Wang L, Fu S, Wen A, Ruan X, He H, Liu S, Moon S, Mai M, Riaz IB, Wang N, Yang P, Xu H, Warner JL, Liu H. Assessment of Electronic Health Record for Cancer Research and Patient Care Through a Scoping Review of Cancer Natural Language Processing. JCO Clin Cancer Inform 2022; 6:e2200006. [PMID: 35917480 PMCID: PMC9470142 DOI: 10.1200/cci.22.00006] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2022] [Revised: 03/18/2022] [Accepted: 06/15/2022] [Indexed: 11/20/2022] Open
Abstract
PURPOSE The advancement of natural language processing (NLP) has promoted the use of detailed textual data in electronic health records (EHRs) to support cancer research and to facilitate patient care. In this review, we aim to assess EHR for cancer research and patient care by using the Minimal Common Oncology Data Elements (mCODE), which is a community-driven effort to define a minimal set of data elements for cancer research and practice. Specifically, we aim to assess the alignment of NLP-extracted data elements with mCODE and review existing NLP methodologies for extracting said data elements. METHODS Published literature studies were searched to retrieve cancer-related NLP articles that were written in English and published between January 2010 and September 2020 from main literature databases. After the retrieval, articles with EHRs as the data source were manually identified. A charting form was developed for relevant study analysis and used to categorize data including four main topics: metadata, EHR data and targeted cancer types, NLP methodology, and oncology data elements and standards. RESULTS A total of 123 publications were selected finally and included in our analysis. We found that cancer research and patient care require some data elements beyond mCODE as expected. Transparency and reproductivity are not sufficient in NLP methods, and inconsistency in NLP evaluation exists. CONCLUSION We conducted a comprehensive review of cancer NLP for research and patient care using EHRs data. Issues and barriers for wide adoption of cancer NLP were identified and discussed.
Collapse
Affiliation(s)
- Liwei Wang
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, MN
| | - Sunyang Fu
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, MN
| | - Andrew Wen
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, MN
| | - Xiaoyang Ruan
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, MN
| | - Huan He
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, MN
| | - Sijia Liu
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, MN
| | - Sungrim Moon
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, MN
| | - Michelle Mai
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, MN
| | - Irbaz B. Riaz
- Department of Hematology/Oncology, Mayo Clinic, Scottsdale, AZ
| | - Nan Wang
- Department of Computer Science and Engineering, College of Science and Engineering, University of Minnesota, Minneapolis, MN
| | - Ping Yang
- Department of Quantitative Health Sciences, Mayo Clinic, Scottsdale, AZ
| | - Hua Xu
- School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX
| | - Jeremy L. Warner
- Departments of Medicine (Hematology/Oncology), Vanderbilt University, Nashville, TN
- Department Biomedical Informatics, Vanderbilt University, Nashville, TN
| | - Hongfang Liu
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, MN
| |
Collapse
|
14
|
Grabar N, Grouin C. Year 2020 (with COVID): Observation of Scientific Literature on Clinical Natural Language Processing. Yearb Med Inform 2021; 30:257-263. [PMID: 34479397 PMCID: PMC8416212 DOI: 10.1055/s-0041-1726528] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022] Open
Abstract
Objectives:
To analyze the content of publications within the medical NLP domain in 2020.
Methods:
Automatic and manual preselection of publications to be reviewed, and selection of the best NLP papers of the year. Analysis of the important issues.
Results:
Three best papers have been selected in 2020. We also propose an analysis of the content of the NLP publications in 2020, all topics included.
Conclusion:
The two main issues addressed in 2020 are related to the investigation of COVID-related questions and to the further adaptation and use of transformer models. Besides, the trends from the past years continue, such as diversification of languages processed and use of information from social networks
Collapse
Affiliation(s)
- Natalia Grabar
- Université Paris Saclay, CNRS, Laboratoire Interdisciplinaire des Sciences du Numérique, Orsay, France.,STL, CNRS, Université de Lille, Domaine du Pont-de-bois, Villeneuve-d'Ascq cedex, France
| | - Cyril Grouin
- Université Paris Saclay, CNRS, Laboratoire Interdisciplinaire des Sciences du Numérique, Orsay, France
| | | |
Collapse
|
15
|
Segelov E, Carrington C, Aranda S, Currow D, Zalcberg JR, Heriot AG, Mileshkin L, Coutsouvelis J, Millar JL, Collopy BT, Emery JD, Zhang P, Cooper S, O'Kane C, Wale J, Hancock SJ, Sulkowski A, Bashford J. Developing clinical indicators for oncology: the inaugural cancer care indicator set for the Australian Council on Healthcare Standards. Med J Aust 2021; 214:528-531. [PMID: 34053081 DOI: 10.5694/mja2.51087] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023]
Abstract
INTRODUCTION The Australian Council on Healthcare Standards (ACHS) sponsored an expert-led, consensus-driven, four-stage process, based on a modified Delphi methodology, to determine a set of clinical indicators as quality measures of cancer service provision in Australia. This was done in response to requests from institutional health care providers seeking accreditation, which were additional and complementary to the existing radiation oncology set. The steering group members comprised multidisciplinary key opinion leaders and a consumer representative. Five additional participants constituted the stakeholder group, who deliberated on the final indicator set. METHODS AND RECOMMENDATIONS An initial meeting of the steering group scoped the high level nature of the desired set. In stage 2, 65 candidate indicators were identified by a literature review and a search of international metrics. These were ranked by survey, based on ease of data accessibility and collectability and clinical relevance. The top 27 candidates were debated by the stakeholder group and culled to a final set of 16 indicators. A user manual was created with indicators mapped to clinical codes. The indicator set was ratified by the Clinical Oncology Society of Australia and is now available for use by health care organisations participating in the ACHS Clinical Indicator Program. This inaugural cancer clinical indicator set covers high level assessment of various critical processes in cancer service provision in Australia. Regular reviews and updates will ensure usability. CHANGES IN MANAGEMENT AS A RESULT OF THIS STATEMENT This is the inaugural indicator set for cancer care for use across Australia and internationally under the ACHS Clinical Indicator Program. Multidisciplinary involvement through a modified Delphi process selected indicators representing both generic and specific aspects of care across the cancer journey pathway and will provide a functional tool to compare health care delivery across multiple settings. It is anticipated that this will drive continual improvement in cancer care provision.
Collapse
Affiliation(s)
- Eva Segelov
- Monash University, Melbourne, VIC.,Monash Health, Melbourne, VIC
| | | | | | | | | | - Alexander G Heriot
- Epworth HealthCare, Melbourne, VIC.,Peter MacCallum Cancer Centre, Melbourne, VIC
| | | | | | - Jeremy L Millar
- Monash University, Melbourne, VIC.,Alfred Health, Melbourne, VIC
| | - Brian T Collopy
- CQM Consultants, Melbourne, VIC.,Australian Council on Healthcare Standards, Sydney, NSW
| | | | - Phoebe Zhang
- Australian Council on Healthcare Standards, Sydney, NSW
| | - Simon Cooper
- Australian Council on Healthcare Standards, Sydney, NSW
| | - Carmel O'Kane
- Wimmera Cancer Centre, Wimmera Health Care Group, Horsham, VIC
| | - Janet Wale
- Australian Council on Healthcare Standards, Sydney, NSW
| | | | | | | |
Collapse
|
16
|
Coquet J, Bievre N, Billaut V, Seneviratne M, Magnani CJ, Bozkurt S, Brooks JD, Hernandez-Boussard T. Assessment of a Clinical Trial-Derived Survival Model in Patients With Metastatic Castration-Resistant Prostate Cancer. JAMA Netw Open 2021; 4:e2031730. [PMID: 33481032 PMCID: PMC7823224 DOI: 10.1001/jamanetworkopen.2020.31730] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
Abstract
IMPORTANCE Randomized clinical trials (RCTs) are considered the criterion standard for clinical evidence. Despite their many benefits, RCTs have limitations, such as costliness, that may reduce the generalizability of their findings among diverse populations and routine care settings. OBJECTIVE To assess the performance of an RCT-derived prognostic model that predicts survival among patients with metastatic castration-resistant prostate cancer (CRPC) when the model is applied to real-world data from electronic health records (EHRs). DESIGN, SETTING, AND PARTICIPANTS The RCT-trained model and patient data from the RCTs were obtained from the Dialogue for Reverse Engineering Assessments and Methods (DREAM) challenge for prostate cancer, which occurred from March 16 to July 27, 2015. This challenge included 4 phase 3 clinical trials of patients with metastatic CRPC. Real-world data were obtained from the EHRs of a tertiary care academic medical center that includes a comprehensive cancer center. In this study, the DREAM challenge RCT-trained model was applied to real-world data from January 1, 2008, to December 31, 2019; the model was then retrained using EHR data with optimized feature selection. Patients with metastatic CRPC were divided into RCT and EHR cohorts based on data source. Data were analyzed from March 23, 2018, to October 22, 2020. EXPOSURES Patients who received treatment for metastatic CRPC. MAIN OUTCOMES AND MEASURES The primary outcome was the performance of an RCT-derived prognostic model that predicts survival among patients with metastatic CRPC when the model is applied to real-world data. Model performance was compared using 10-fold cross-validation according to time-dependent integrated area under the curve (iAUC) statistics. RESULTS Among 2113 participants with metastatic CRPC, 1600 participants were included in the RCT cohort, and 513 participants were included in the EHR cohort. The RCT cohort comprised a larger proportion of White participants (1390 patients [86.9%] vs 337 patients [65.7%]) and a smaller proportion of Hispanic participants (14 patients [0.9%] vs 42 patients [8.2%]), Asian participants (41 patients [2.6%] vs 88 patients [17.2%]), and participants older than 75 years (388 patients [24.3%] vs 191 patients [37.2%]) compared with the EHR cohort. Participants in the RCT cohort also had fewer comorbidities (mean [SD], 1.6 [1.8] comorbidities vs 2.5 [2.6] comorbidities, respectively) compared with those in the EHR cohort. Of the 101 variables used in the RCT-derived model, 10 were not available in the EHR data set, 3 of which were among the top 10 features in the DREAM challenge RCT model. The best-performing EHR-trained model included only 25 of the 101 variables included in the RCT-trained model. The performance of the RCT-trained and EHR-trained models was adequate in the EHR cohort (mean [SD] iAUC, 0.722 [0.118] and 0.762 [0.106], respectively); model optimization was associated with improved performance of the best-performing EHR model (mean [SD] iAUC, 0.792 [0.097]). The EHR-trained model classified 256 patients as having a high risk of mortality and 256 patients as having a low risk of mortality (hazard ratio, 2.7; 95% CI, 2.0-3.7; log-rank P < .001). CONCLUSIONS AND RELEVANCE In this study, although the RCT-trained models did not perform well when applied to real-world EHR data, retraining the models using real-world EHR data and optimizing variable selection was beneficial for model performance. As clinical evidence evolves to include more real-world data, both industry and academia will likely search for ways to balance model optimization with generalizability. This study provides a pragmatic approach to applying RCT-trained models to real-world data.
Collapse
Affiliation(s)
- Jean Coquet
- Department of Medicine, Stanford University School of Medicine, Stanford, California
| | - Nicolas Bievre
- Department of Statistics, Stanford University, Stanford, California
| | - Vincent Billaut
- Department of Statistics, Stanford University, Stanford, California
| | - Martin Seneviratne
- Department of Medicine, Stanford University School of Medicine, Stanford, California
- Department of Biomedical Data Science, Stanford University, Stanford, California
| | | | - Selen Bozkurt
- Department of Medicine, Stanford University School of Medicine, Stanford, California
| | - James D. Brooks
- Department of Urology, Stanford University School of Medicine, Stanford, California
- Stanford Cancer Institute, Stanford University School of Medicine, Stanford, California
| | - Tina Hernandez-Boussard
- Department of Medicine, Stanford University School of Medicine, Stanford, California
- Department of Biomedical Data Science, Stanford University, Stanford, California
- Department of Surgery, Stanford University School of Medicine, Stanford, California
| |
Collapse
|
17
|
Magnani CJ, Bievre N, Baker LC, Brooks JD, Blayney DW, Hernandez-Boussard T. Real-world Evidence to Estimate Prostate Cancer Costs for First-line Treatment or Active Surveillance. EUR UROL SUPPL 2020; 23:20-29. [PMID: 33367287 PMCID: PMC7751921 DOI: 10.1016/j.euros.2020.11.004] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022] Open
Abstract
Background Prostate cancer is the most common cancer in men and second leading cause of cancer-related deaths. Changes in screening guidelines, adoption of active surveillance (AS), and implementation of high-cost technologies have changed treatment costs. Traditional cost-effectiveness studies rely on clinical trial protocols unlikely to capture actual practice behavior, and existing studies use data predating new technologies. Real-world evidence reflecting these changes is lacking. Objective To assess real-world costs of first-line prostate cancer management. Design setting and participants We used clinical electronic health records for 2008-2018 linked with the California Cancer Registry and the Medicare Fee Schedule to assess costs over 24 or 60 mo following diagnosis. We identified surgery or radiation treatments with structured methods, while we used both structured data and natural language processing to identify AS. Outcome measurements and statistical analysis Our results are risk-stratified calculated cost per day (CCPD) for first-line management, which are independent of treatment duration. We used the Kruskal-Wallis test to compare unadjusted CCPD while analysis of covariance log-linear models adjusted estimates for age and Charlson comorbidity. Results and limitations In 3433 patients, surgery (54.6%) was more common than radiation (22.3%) or AS (23.0%). Two years following diagnosis, AS ($2.97/d) was cheaper than surgery ($5.67/d) or radiation ($9.34/d) in favorable disease, while surgery ($7.17/d) was cheaper than radiation ($16.34/d) for unfavorable disease. At 5 yr, AS ($2.71/d) remained slightly cheaper than surgery ($2.87/d) and radiation ($4.36/d) in favorable disease, while for unfavorable disease surgery ($4.15/d) remained cheaper than radiation ($10.32/d). Study limitations include information derived from a single healthcare system and costs based on benchmark Medicare estimates rather than actual payment exchanges. Patient summary Active surveillance was cheaper than surgery (-47.6%) and radiation (-68.2%) at 2 yr for favorable-risk disease, which decreased by 5 yr (-5.6% and -37.8%, respectively). Surgery was less costly than radiation for unfavorable risk for both intervals (-56.1% and -59.8%, respectively).
Collapse
Affiliation(s)
| | - Nicolas Bievre
- Department of Statistics, Stanford University, Stanford, CA, USA
| | - Laurence C Baker
- Department of Medicine, School of Medicine, Stanford University, Stanford, CA, USA
| | - James D Brooks
- Department of Urology, Stanford University, Stanford, CA, USA
| | - Douglas W Blayney
- Department of Medicine, School of Medicine, Stanford University, Stanford, CA, USA.,Stanford Cancer Institute, School of Medicine, Stanford University, CA, USA.,Clinical Excellence Research Center, School of Medicine, Stanford University, CA, USA
| | | |
Collapse
|
18
|
Wang SS, Goodman MT, Bondy M. Modernizing Population Sciences in the Digital Age. Cancer Epidemiol Biomarkers Prev 2020; 29:712-713. [PMID: 32238400 DOI: 10.1158/1055-9965.epi-20-0268] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2020] [Accepted: 02/19/2020] [Indexed: 11/16/2022] Open
Affiliation(s)
- Sophia S Wang
- Division of Health Analytics, Department of Computational and Quantitative Medicine, City of Hope, Duarte, California.
| | - Marc T Goodman
- Department of Medicine, Cedars-Sinai Medical Center, Los Angeles, California
| | - Melissa Bondy
- Department of Epidemiology and Population Health, Stanford School of Medicine, Stanford University, Stanford, California
| |
Collapse
|