Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Kehl KL, Xu W, Lepisto E, Elmarakeby H, Hassett MJ, Van Allen EM, Johnson BE, Schrag D. Natural Language Processing to Ascertain Cancer Outcomes From Medical Oncologist Notes. JCO Clin Cancer Inform 2021;4:680-690. [PMID: 32755459 DOI: 10.1200/cci.20.00020] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023] Open

For:	Kehl KL, Xu W, Lepisto E, Elmarakeby H, Hassett MJ, Van Allen EM, Johnson BE, Schrag D. Natural Language Processing to Ascertain Cancer Outcomes From Medical Oncologist Notes. JCO Clin Cancer Inform 2021;4:680-690. [PMID: 32755459 DOI: 10.1200/cci.20.00020] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023] Open

Number

Cited by Other Article(s)

Zeinali N, Albashayreh A, Fan W, White SG. Symptom-BERT: Enhancing Cancer Symptom Detection in EHR Clinical Notes. J Pain Symptom Manage 2024:S0885-3924(24)00784-X. [PMID: 38789092 DOI: 10.1016/j.jpainsymman.2024.05.015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/05/2024] [Revised: 05/08/2024] [Accepted: 05/14/2024] [Indexed: 05/26/2024]

Abstract

CONTEXT

Extracting cancer symptom documentation allows clinicians to develop highly individualized symptom prediction algorithms to deliver symptom management care. Leveraging advanced language models to detect symptom data in clinical narratives can significantly enhance this process.

OBJECTIVE

This study uses a pretrained large language model to detect and extract cancer symptoms in clinical notes.

METHODS

We developed a pretrained language model to identify cancer symptoms in clinical notes based on a clinical corpus from the Enterprise Data Warehouse for Research at a healthcare system in the Midwestern United States. This study was conducted in 4 phases:1 pretraining a Bio-Clinical BERT model on one million unlabeled clinical documents,2 fine-tuning Symptom-BERT for detecting 13 cancer symptom groups within 1112 annotated clinical notes,3 generating 180 synthetic clinical notes using ChatGPT-4 for external validation, and4 comparing the internal and external performance of Symptom-BERT against a non-pretrained version and six other BERT implementations.

RESULTS

The Symptom-BERT model effectively detected cancer symptoms in clinical notes. It achieved results with a micro-averaged F1-score of 0.933, an AUC of 0.929 internally, and 0.831 and 0.834 externally. Our analysis shows that physical symptoms, like Pruritus, are typically identified with higher performance than psychological symptoms, such as anxiety.

CONCLUSION

This study underscores the transformative potential of specialized pretraining on domain-specific data in boosting the performance of language models for medical applications. The Symptom-BERT model's exceptional efficacy in detecting cancer symptoms heralds a groundbreaking stride in patient-centered AI technologies, offering a promising path to elevate symptom management and cultivate superior patient self-care outcomes.

Collapse

Zhang C, Xu J, Tang R, Yang J, Wang W, Yu X, Shi S. Novel research and future prospects of artificial intelligence in cancer diagnosis and treatment. J Hematol Oncol 2023;16:114. [PMID: 38012673 PMCID: PMC10680201 DOI: 10.1186/s13045-023-01514-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2023] [Accepted: 11/20/2023] [Indexed: 11/29/2023] Open

Affiliation(s)

Chaoyi Zhang Department of Pancreatic Surgery, Fudan University Shanghai Cancer Center, No. 270 Dong'An Road, Shanghai, 200032, People's Republic of China Department of Oncology, Shanghai Medical College, Fudan University, Shanghai, 200032, People's Republic of China Shanghai Pancreatic Cancer Institute, No. 399 Lingling Road, Shanghai, 200032, People's Republic of China Pancreatic Cancer Institute, Fudan University, Shanghai, 200032, People's Republic of China
Jin Xu Department of Pancreatic Surgery, Fudan University Shanghai Cancer Center, No. 270 Dong'An Road, Shanghai, 200032, People's Republic of China Department of Oncology, Shanghai Medical College, Fudan University, Shanghai, 200032, People's Republic of China Shanghai Pancreatic Cancer Institute, No. 399 Lingling Road, Shanghai, 200032, People's Republic of China Pancreatic Cancer Institute, Fudan University, Shanghai, 200032, People's Republic of China
Rong Tang Department of Pancreatic Surgery, Fudan University Shanghai Cancer Center, No. 270 Dong'An Road, Shanghai, 200032, People's Republic of China Department of Oncology, Shanghai Medical College, Fudan University, Shanghai, 200032, People's Republic of China Shanghai Pancreatic Cancer Institute, No. 399 Lingling Road, Shanghai, 200032, People's Republic of China Pancreatic Cancer Institute, Fudan University, Shanghai, 200032, People's Republic of China
Jianhui Yang Department of Pancreatic Surgery, Fudan University Shanghai Cancer Center, No. 270 Dong'An Road, Shanghai, 200032, People's Republic of China Department of Oncology, Shanghai Medical College, Fudan University, Shanghai, 200032, People's Republic of China Shanghai Pancreatic Cancer Institute, No. 399 Lingling Road, Shanghai, 200032, People's Republic of China Pancreatic Cancer Institute, Fudan University, Shanghai, 200032, People's Republic of China
Wei Wang Department of Pancreatic Surgery, Fudan University Shanghai Cancer Center, No. 270 Dong'An Road, Shanghai, 200032, People's Republic of China Department of Oncology, Shanghai Medical College, Fudan University, Shanghai, 200032, People's Republic of China Shanghai Pancreatic Cancer Institute, No. 399 Lingling Road, Shanghai, 200032, People's Republic of China Pancreatic Cancer Institute, Fudan University, Shanghai, 200032, People's Republic of China
Xianjun Yu Department of Pancreatic Surgery, Fudan University Shanghai Cancer Center, No. 270 Dong'An Road, Shanghai, 200032, People's Republic of China. Department of Oncology, Shanghai Medical College, Fudan University, Shanghai, 200032, People's Republic of China. Shanghai Pancreatic Cancer Institute, No. 399 Lingling Road, Shanghai, 200032, People's Republic of China. Pancreatic Cancer Institute, Fudan University, Shanghai, 200032, People's Republic of China.
Si Shi Department of Pancreatic Surgery, Fudan University Shanghai Cancer Center, No. 270 Dong'An Road, Shanghai, 200032, People's Republic of China. Department of Oncology, Shanghai Medical College, Fudan University, Shanghai, 200032, People's Republic of China. Shanghai Pancreatic Cancer Institute, No. 399 Lingling Road, Shanghai, 200032, People's Republic of China. Pancreatic Cancer Institute, Fudan University, Shanghai, 200032, People's Republic of China.

Collapse

Kotevski DP, Vajdic CM, Field M, Smee RI. Inter-hospital variation in data collection, radiotherapy treatment, and survival in patients with head and neck cancer: A multisite study. Radiother Oncol 2023;188:109843. [PMID: 37543056 DOI: 10.1016/j.radonc.2023.109843] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2023] [Revised: 06/14/2023] [Accepted: 07/27/2023] [Indexed: 08/07/2023]

Abstract

BACKGROUND AND PURPOSE

Inter-hospital inequalities in head and neck cancer (HNC) survival may exist due to variation in radiotherapy treatment-related factors. This study investigated inter-hospital variation in data collection, primary radiotherapy treatment, and survival in HNC patients from an Australian setting.

MATERIALS AND METHODS

Data collected in oncology information systems (OIS) from seven Australian hospitals was extracted for 3,182 adults treated with curative radiotherapy, with or without surgery or chemotherapy, for primary, non-metastatic squamous cell carcinoma of the head and neck (2000-2017). Death data was sourced from the National Death Index using record linkage. Multivariable Cox regression was used to assess the association between survival and hospital.

RESULTS

Inter-hospital variation in data collection, primary radiotherapy dose, and five-year HNC-related death was detected. Completion of eleven fields ranged from 66%-98%. Primary radiotherapy treated Tis-T1N0 glottic and any stage oral cavity and oropharynx cancers received significantly different time-corrected biologically equivalent dose in two gray fractions (EQD2T) by hospital, with observed deviation from Australian radiotherapy guidelines. Increased EQD2T dose was associated with a reduced risk of five-year HNC-related death in all patients and those treated with primary radiotherapy. Hospital, tumour site, and T and N classification were also identified as independent prognostic factors for five-year HNC-related death in all patients treated with radiotherapy.

CONCLUSION

Unexplained variation exists in HNC-related death in patients treated at Australian hospitals. Available routinely collected data in OIS are insufficient to explain variation in survival. Innovative data collection, extraction, and classification practices are needed to inform clinical practice.

Collapse

Elbatarny L, Do RKG, Gangai N, Ahmed F, Chhabra S, Simpson AL. Applying Natural Language Processing to Single-Report Prediction of Metastatic Disease Response Using the OR-RADS Lexicon. Cancers (Basel) 2023;15:4909. [PMID: 37894276 PMCID: PMC10605614 DOI: 10.3390/cancers15204909] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2023] [Revised: 09/25/2023] [Accepted: 09/26/2023] [Indexed: 10/29/2023] Open

Tan RSYC, Lin Q, Low GH, Lin R, Goh TC, Chang CCE, Lee FF, Chan WY, Tan WC, Tey HJ, Leong FL, Tan HQ, Nei WL, Chay WY, Tai DWM, Lai GGY, Cheng LTE, Wong FY, Chua MCH, Chua MLK, Tan DSW, Thng CH, Tan IBH, Ng HT. Inferring cancer disease response from radiology reports using large language models with data augmentation and prompting. J Am Med Inform Assoc 2023;30:1657-1664. [PMID: 37451682 PMCID: PMC10531105 DOI: 10.1093/jamia/ocad133] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2023] [Revised: 06/27/2023] [Accepted: 07/04/2023] [Indexed: 07/18/2023] Open

Affiliation(s)

Ryan Shea Ying Cong Tan Division of Medical Oncology, National Cancer Centre Singapore, Singapore Duke-NUS Medical School, Singapore
Qian Lin Department of Computer Science, National University of Singapore, Singapore
Guat Hwa Low Division of Medical Oncology, National Cancer Centre Singapore, Singapore
Ruixi Lin Department of Computer Science, National University of Singapore, Singapore
Tzer Chew Goh Institute of Systems Science, National University of Singapore, Singapore
Christopher Chu En Chang Institute of Systems Science, National University of Singapore, Singapore
Fung Fung Lee Institute of Systems Science, National University of Singapore, Singapore
Wei Yin Chan Institute of Systems Science, National University of Singapore, Singapore
Wei Chong Tan Division of Medical Oncology, National Cancer Centre Singapore, Singapore Duke-NUS Medical School, Singapore
Han Jieh Tey Division of Medical Oncology, National Cancer Centre Singapore, Singapore
Fun Loon Leong Division of Medical Oncology, National Cancer Centre Singapore, Singapore
Hong Qi Tan Division of Radiation Oncology, National Cancer Centre Singapore, Singapore
Wen Long Nei Division of Radiation Oncology, National Cancer Centre Singapore, Singapore
Wen Yee Chay Division of Medical Oncology, National Cancer Centre Singapore, Singapore Duke-NUS Medical School, Singapore
David Wai Meng Tai Division of Medical Oncology, National Cancer Centre Singapore, Singapore Duke-NUS Medical School, Singapore
Gillianne Geet Yi Lai Division of Medical Oncology, National Cancer Centre Singapore, Singapore Duke-NUS Medical School, Singapore
Lionel Tim-Ee Cheng Duke-NUS Medical School, Singapore Department of Diagnostic Radiology, Singapore General Hospital, Singapore
Fuh Yong Wong Division of Radiation Oncology, National Cancer Centre Singapore, Singapore
Matthew Chin Heng Chua Yong Loo Lin School of Medicine, National University of Singapore, Singapore
Melvin Lee Kiang Chua Duke-NUS Medical School, Singapore Division of Radiation Oncology, National Cancer Centre Singapore, Singapore Data and Computational Science Core, National Cancer Centre Singapore, Singapore
Daniel Shao Weng Tan Division of Medical Oncology, National Cancer Centre Singapore, Singapore Division of Clinical Trials and Epidemiological Sciences, National Cancer Centre Singapore, Singapore
Choon Hua Thng Duke-NUS Medical School, Singapore Division of Oncologic Imaging, National Cancer Centre Singapore, Singapore
Iain Bee Huat Tan Division of Medical Oncology, National Cancer Centre Singapore, Singapore Duke-NUS Medical School, Singapore Data and Computational Science Core, National Cancer Centre Singapore, Singapore
Hwee Tou Ng Department of Computer Science, National University of Singapore, Singapore

Collapse

Elmarakeby HA, Trukhanov PS, Arroyo VM, Riaz IB, Schrag D, Van Allen EM, Kehl KL. Empirical evaluation of language modeling to ascertain cancer outcomes from clinical text reports. BMC Bioinformatics 2023;24:328. [PMID: 37658330 PMCID: PMC10474750 DOI: 10.1186/s12859-023-05439-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2023] [Accepted: 08/07/2023] [Indexed: 09/03/2023] Open

Abstract

BACKGROUND

Longitudinal data on key cancer outcomes for clinical research, such as response to treatment and disease progression, are not captured in standard cancer registry reporting. Manual extraction of such outcomes from unstructured electronic health records is a slow, resource-intensive process. Natural language processing (NLP) methods can accelerate outcome annotation, but they require substantial labeled data. Transfer learning based on language modeling, particularly using the Transformer architecture, has achieved improvements in NLP performance. However, there has been no systematic evaluation of NLP model training strategies on the extraction of cancer outcomes from unstructured text.

RESULTS

We evaluated the performance of nine NLP models at the two tasks of identifying cancer response and cancer progression within imaging reports at a single academic center among patients with non-small cell lung cancer. We trained the classification models under different conditions, including training sample size, classification architecture, and language model pre-training. The training involved a labeled dataset of 14,218 imaging reports for 1112 patients with lung cancer. A subset of models was based on a pre-trained language model, DFCI-ImagingBERT, created by further pre-training a BERT-based model using an unlabeled dataset of 662,579 reports from 27,483 patients with cancer from our center. A classifier based on our DFCI-ImagingBERT, trained on more than 200 patients, achieved the best results in most experiments; however, these results were marginally better than simpler "bag of words" or convolutional neural network models.

CONCLUSION

When developing AI models to extract outcomes from imaging reports for clinical cancer research, if computational resources are plentiful but labeled training data are limited, large language models can be used for zero- or few-shot learning to achieve reasonable performance. When computational resources are more limited but labeled training data are readily available, even simple machine learning architectures can achieve good performance for such tasks.

Collapse

Solarte-Pabón O, Montenegro O, García-Barragán A, Torrente M, Provencio M, Menasalvas E, Robles V. Transformers for extracting breast cancer information from Spanish clinical narratives. Artif Intell Med 2023;143:102625. [PMID: 37673566 DOI: 10.1016/j.artmed.2023.102625] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2022] [Revised: 05/11/2023] [Accepted: 07/08/2023] [Indexed: 09/08/2023]

Moon I, LoPiccolo J, Baca SC, Sholl LM, Kehl KL, Hassett MJ, Liu D, Schrag D, Gusev A. Machine learning for genetics-based classification and treatment response prediction in cancer of unknown primary. Nat Med 2023;29:2057-2067. [PMID: 37550415 DOI: 10.1038/s41591-023-02482-6] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2023] [Accepted: 06/30/2023] [Indexed: 08/09/2023]

Mottin L, Goldman JP, Jäggli C, Achermann R, Gobeill J, Knafou J, Ehrsam J, Wicky A, Gérard CL, Schwenk T, Charrier M, Tsantoulis P, Lovis C, Leichtle A, Kiessling MK, Michielin O, Pradervand S, Foufi V, Ruch P. Multilingual RECIST classification of radiology reports using supervised learning. Front Digit Health 2023;5:1195017. [PMID: 37388252 PMCID: PMC10303934 DOI: 10.3389/fdgth.2023.1195017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2023] [Accepted: 06/05/2023] [Indexed: 07/01/2023] Open

Affiliation(s)

Luc Mottin HES-SO\HEG Genève, Information Sciences, Geneva, Switzerland SIB Text Mining Group, Swiss Institute of Bioinformatics, Geneva, Switzerland
Jean-Philippe Goldman Division of Medical Information Sciences, University Hospitals of Geneva, Geneva, Switzerland
Christoph Jäggli Inselspital – Bern University Hospital and University of Bern, Bern, Switzerland
Rita Achermann Department of Radiology, Clinic of Radiology & Nuclear Medicine, University Hospital Basel, University of Basel, Basel, Switzerland
Julien Gobeill HES-SO\HEG Genève, Information Sciences, Geneva, Switzerland SIB Text Mining Group, Swiss Institute of Bioinformatics, Geneva, Switzerland
Julien Knafou HES-SO\HEG Genève, Information Sciences, Geneva, Switzerland SIB Text Mining Group, Swiss Institute of Bioinformatics, Geneva, Switzerland
Julien Ehrsam Department of Radiology and Medical Informatics, University of Geneva, Geneva, Switzerland
Alexandre Wicky Precision Oncology Center, Oncology Department, Centre Hospitalier Universitaire Vaudois – CHUV, Lausanne, Switzerland
Camille L. Gérard Precision Oncology Center, Oncology Department, Centre Hospitalier Universitaire Vaudois – CHUV, Lausanne, Switzerland
Tanja Schwenk Department of Oncology, Kantonsspital Aarau, Aarau, Switzerland
Mélinda Charrier Division of Medical Information Sciences, University Hospitals of Geneva, Geneva, Switzerland
Petros Tsantoulis Division of Medical Information Sciences, University Hospitals of Geneva, Geneva, Switzerland Department of Radiology and Medical Informatics, University of Geneva, Geneva, Switzerland
Christian Lovis Division of Medical Information Sciences, University Hospitals of Geneva, Geneva, Switzerland Department of Radiology and Medical Informatics, University of Geneva, Geneva, Switzerland
Alexander Leichtle Inselspital – Bern University Hospital and University of Bern, Bern, Switzerland
Michael K. Kiessling Department of Medical Oncology and Hematology, University Hospital Zurich, Zurich, Switzerland
Olivier Michielin Precision Oncology Center, Oncology Department, Centre Hospitalier Universitaire Vaudois – CHUV, Lausanne, Switzerland
Sylvain Pradervand Precision Oncology Center, Oncology Department, Centre Hospitalier Universitaire Vaudois – CHUV, Lausanne, Switzerland
Vasiliki Foufi Division of Medical Information Sciences, University Hospitals of Geneva, Geneva, Switzerland
Patrick Ruch HES-SO\HEG Genève, Information Sciences, Geneva, Switzerland SIB Text Mining Group, Swiss Institute of Bioinformatics, Geneva, Switzerland

Collapse

Khan MS, Usman MS, Talha KM, Van Spall HGC, Greene SJ, Vaduganathan M, Khan SS, Mills NL, Ali ZA, Mentz RJ, Fonarow GC, Rao SV, Spertus JA, Roe MT, Anker SD, James SK, Butler J, McGuire DK. Leveraging electronic health records to streamline the conduct of cardiovascular clinical trials. Eur Heart J 2023;44:1890-1909. [PMID: 37098746 DOI: 10.1093/eurheartj/ehad171] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/17/2022] [Revised: 02/05/2023] [Accepted: 03/07/2023] [Indexed: 04/27/2023] Open

Abstract

Conventional randomized controlled trials (RCTs) can be expensive, time intensive, and complex to conduct. Trial recruitment, participation, and data collection can burden participants and research personnel. In the past two decades, there have been rapid technological advances and an exponential growth in digitized healthcare data. Embedding RCTs, including cardiovascular outcome trials, into electronic health record systems or registries may streamline screening, consent, randomization, follow-up visits, and outcome adjudication. Moreover, wearable sensors (i.e. health and fitness trackers) provide an opportunity to collect data on cardiovascular health and risk factors in unprecedented detail and scale, while growing internet connectivity supports the collection of patient-reported outcomes. There is a pressing need to develop robust mechanisms that facilitate data capture from diverse databases and guidance to standardize data definitions. Importantly, the data collection infrastructure should be reusable to support multiple cardiovascular RCTs over time. Systems, processes, and policies will need to have sufficient flexibility to allow interoperability between different sources of data acquisition. Clinical research guidelines, ethics oversight, and regulatory requirements also need to evolve. This review highlights recent progress towards the use of routinely generated data to conduct RCTs and discusses potential solutions for ongoing barriers. There is a particular focus on methods to utilize routinely generated data for trials while complying with regional data protection laws. The discussion is supported with examples of cardiovascular outcome trials that have successfully leveraged the electronic health record, web-enabled devices or administrative databases to conduct randomized trials.

Collapse

Affiliation(s)

Muhammad Shahzeb Khan Division of Cardiology, Duke University School of Medicine, 2301 Erwin Rd., Durham, NC 27705, USA
Muhammad Shariq Usman Department of Medicine, University of Mississippi Medical Center, 2500 N State St, Jackson, MS 39216, USA
Khawaja M Talha Department of Medicine, University of Mississippi Medical Center, 2500 N State St, Jackson, MS 39216, USA
Harriette G C Van Spall Department of Medicine, McMaster University, Hamilton, ON, Canada Department of Health Research Methods, Evidence, and Impact, McMaster University, Hamilton, ON, Canada Population Health Research Institute, Hamilton, ON, Canada
Stephen J Greene Division of Cardiology, Duke University School of Medicine, 2301 Erwin Rd., Durham, NC 27705, USA Duke Clinical Research Institute, Durham, NC, USA
Muthiah Vaduganathan Cardiovascular Division, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
Sadiya S Khan Department of Preventive Medicine, Northwestern University Feinberg School of Medicine, Chicago, IL, USA
Nicholas L Mills BHF Centre for Cardiovascular Science, University of Edinburgh, Chancellors Building, Royal Infirmary of Edinburgh, Edinburgh, Scotland, UK Usher Institute, University of Edinburgh, Edinburgh, Scotland, UK
Ziad A Ali DeMatteis Cardiovascular Institute, St Francis Hospital and Heart Center, Roslyn, NY, USA
Robert J Mentz Division of Cardiology, Duke University School of Medicine, 2301 Erwin Rd., Durham, NC 27705, USA Duke Clinical Research Institute, Durham, NC, USA
Gregg C Fonarow Division of Cardiology, University of California Los Angeles, Los Angeles, CA, USA
Sunil V Rao Division of Cardiology, New York University Langone Health System, New York, NY, USA
John A Spertus Department of Cardiology, Saint Luke's Mid America Heart Institute, Kansas City, MO, USA Kansas City's Healthcare Institute for Innovations in Quality, University of Missouri, Kansas, MO, USA
Matthew T Roe Division of Cardiology, Duke University School of Medicine, 2301 Erwin Rd., Durham, NC 27705, USA Duke Clinical Research Institute, Durham, NC, USA
Stefan D Anker Department of Cardiology (CVK), Berlin Institute of Health Center for Regenerative Therapies (BCRT), and German Centre for Cardiovascular Research (DZHK) Partner Site Berlin, Charité Universitätsmedizin, Berlin, Germany
Stefan K James Department of Medical Sciences, Scientific Director UCR, Uppsala University, Uppsala, Uppland, Sweden
Javed Butler Department of Medicine, University of Mississippi Medical Center, 2500 N State St, Jackson, MS 39216, USA Baylor Scott & White Research Institute, Dallas, TX, USA
Darren K McGuire Division of Cardiology, Department of Internal Medicine, UT Southwestern Medical Center and Parkland Health and Hospital System, Dallas, TX, USA

Collapse

Rios-Doria E, Momeni-Boroujeni A, Friedman CF, Selenica P, Zhou Q, Wu M, Marra A, Leitao MM, Iasonos A, Alektiar KM, Sonoda Y, Makker V, Jewell E, Liu Y, Chi D, Zamarin D, Abu-Rustum NR, Aghajanian C, Mueller JJ, Ellenson LH, Weigelt B. Integration of clinical sequencing and immunohistochemistry for the molecular classification of endometrial carcinoma. Gynecol Oncol 2023;174:262-272. [PMID: 37245486 PMCID: PMC10402916 DOI: 10.1016/j.ygyno.2023.05.059] [Citation(s) in RCA: 15] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2023] [Accepted: 05/16/2023] [Indexed: 05/30/2023]

Abstract

PURPOSE

Using next generation sequencing (NGS), The Cancer Genome Atlas (TCGA) found that endometrial carcinomas (ECs) fall under one of four molecular subtypes, and a POLE mutation status, mismatch repair (MMR) and p53 immunohistochemistry (IHC)-based surrogate has been developed. We sought to retrospectively classify and characterize a large series of unselected ECs that were prospectively subjected to clinical sequencing by utilizing clinical molecular and IHC data.

EXPERIMENTAL DESIGN

All patients with EC with clinical tumor-normal MSK-IMPACT NGS from 2014 to 2020 (n = 2115) were classified by integrating molecular data (i.e., POLE mutation, TP53 mutation, MSIsensor score) and MMR and p53 IHC results. Survival analysis was performed for primary EC patients with upfront surgery at our institution.

RESULTS

Utilizing our integrated approach, significantly more ECs were molecularly classified (1834/2115, 87%) as compared to the surrogate (1387/2115, 66%, p < 0.001), with an almost perfect agreement for classifiable cases (Kappa 0.962, 95% CI 0.949-0.975). Discrepancies were primarily due to TP53 mutations in p53-IHC-normal ECs. Of the 1834 ECs, most were of copy number (CN)-high molecular subtype (40%), followed by CN-low (32%), MSI-high (23%) and POLE (5%). Histologic and genomic variability was present amongst all molecular subtypes. Molecular classification was prognostic in early- and advanced-stage disease, including early-stage endometrioid EC.

CONCLUSIONS

The integration of clinical NGS and IHC data allows for an algorithmic approach to molecularly classifying newly diagnosed EC, while overcoming issues of IHC-based genetic alteration detection. Such integrated approach will be important moving forward given the prognostic and potentially predictive information afforded by this classification.

Collapse

Affiliation(s)

Eric Rios-Doria Gynecology Service, Department of Surgery, Memorial Sloan Kettering Cancer Center, New York, NY, USA
Amir Momeni-Boroujeni Department of Pathology and Laboratory Medicine, Memorial Sloan Kettering Cancer Center, New York, NY, USA
Claire F Friedman Gynecologic Medical Oncology, Department of Medicine, Memorial Sloan Kettering Cancer Center, New York, NY, USA; Department of Medicine, Weil Cornell Medical College, New York, NY, USA
Pier Selenica Department of Pathology and Laboratory Medicine, Memorial Sloan Kettering Cancer Center, New York, NY, USA
Qin Zhou Department of Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer Center, New York, NY, USA
Michelle Wu Gynecology Service, Department of Surgery, Memorial Sloan Kettering Cancer Center, New York, NY, USA
Antonio Marra Department of Pathology and Laboratory Medicine, Memorial Sloan Kettering Cancer Center, New York, NY, USA
Mario M Leitao Gynecology Service, Department of Surgery, Memorial Sloan Kettering Cancer Center, New York, NY, USA; Department of Surgery, Weil Cornell Medical College, New York, NY, USA
Alexia Iasonos Department of Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer Center, New York, NY, USA
Kaled M Alektiar Department of Radiation Oncology, Memorial Sloan Kettering Cancer Center, New York, NY, USA
Yukio Sonoda Gynecology Service, Department of Surgery, Memorial Sloan Kettering Cancer Center, New York, NY, USA; Department of Surgery, Weil Cornell Medical College, New York, NY, USA
Vicky Makker Gynecologic Medical Oncology, Department of Medicine, Memorial Sloan Kettering Cancer Center, New York, NY, USA; Department of Medicine, Weil Cornell Medical College, New York, NY, USA
Elizabeth Jewell Gynecology Service, Department of Surgery, Memorial Sloan Kettering Cancer Center, New York, NY, USA; Department of Surgery, Weil Cornell Medical College, New York, NY, USA
Ying Liu Gynecologic Medical Oncology, Department of Medicine, Memorial Sloan Kettering Cancer Center, New York, NY, USA; Department of Medicine, Weil Cornell Medical College, New York, NY, USA
Dennis Chi Gynecology Service, Department of Surgery, Memorial Sloan Kettering Cancer Center, New York, NY, USA; Department of Surgery, Weil Cornell Medical College, New York, NY, USA
Dimitry Zamarin Gynecologic Medical Oncology, Department of Medicine, Memorial Sloan Kettering Cancer Center, New York, NY, USA; Department of Medicine, Weil Cornell Medical College, New York, NY, USA
Nadeem R Abu-Rustum Gynecology Service, Department of Surgery, Memorial Sloan Kettering Cancer Center, New York, NY, USA; Department of Surgery, Weil Cornell Medical College, New York, NY, USA
Carol Aghajanian Gynecologic Medical Oncology, Department of Medicine, Memorial Sloan Kettering Cancer Center, New York, NY, USA; Department of Medicine, Weil Cornell Medical College, New York, NY, USA
Jennifer J Mueller Gynecology Service, Department of Surgery, Memorial Sloan Kettering Cancer Center, New York, NY, USA; Department of Surgery, Weil Cornell Medical College, New York, NY, USA
Lora H Ellenson Department of Pathology and Laboratory Medicine, Memorial Sloan Kettering Cancer Center, New York, NY, USA
Britta Weigelt Department of Pathology and Laboratory Medicine, Memorial Sloan Kettering Cancer Center, New York, NY, USA.

Collapse

Kroenke K, Lam V, Ruddy KJ, Pachman DR, Herrin J, Rahman PA, Griffin JM, Cheville AL. Prevalence, Severity, and Co-Occurrence of SPPADE Symptoms in 31,866 Patients With Cancer. J Pain Symptom Manage 2023;65:367-377. [PMID: 36738867 PMCID: PMC10106386 DOI: 10.1016/j.jpainsymman.2023.01.020] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/23/2022] [Revised: 01/21/2023] [Accepted: 01/24/2023] [Indexed: 02/05/2023]

Abstract

OBJECTIVES

To examine the prevalence, severity, and co-occurrence of SPPADE symptoms as well as their association with cancer type and patient characteristics.

BACKGROUND

The SPPADE symptoms (sleep disturbance, pain, physical function impairment, anxiety, depression, and low energy /fatigue) are prevalent, co-occurring, and undertreated in oncology and other clinical populations.

METHODS

Baseline SPPADE symptom data were analyzed from the E2C2 study, a stepped wedge pragmatic, population-level, cluster randomized clinical trial designed to evaluate a guideline-informed symptom management model targeting the six SPPADE symptoms. Symptom prevalence and severity were measured with a 0-10 numeric rating (NRS) scale for each of the six symptoms. Prevalence of severe (NRS ≥ 7) and potential clinically relevant (NRS ≥ 5) symptoms as well as co-occurrence of clinical symptoms were determined. Distribution-based methods were used to estimate the minimally important difference (MID). Associations of cancer type and patient characteristics with a SPPADE composite score were analyzed.

RESULTS

A total of 31,886 patients were assessed for SPPADE symptoms prior to, during, or soon after an outpatient medical oncology encounter. The proportion of patients with a potential clinically relevant symptom ranged from 17.5% for depression to 33.4% for fatigue. Co-occurrence of symptoms was high, with the proportion of patients with three or more additional clinically relevant symptoms ranging from 45.2% for fatigue to 68.6% for depression. The summed SPPADE composite score demonstrated good internal reliability (Cronbach's alpha of 0.86), with preliminary MID estimates of 4.1-4.3. Symptom burden differed across several types of cancer but was generally similar across most sociodemographic characteristics.

CONCLUSION

The high prevalence and co-occurrence of SPPADE symptoms in patients with all types of cancer warrants clinical approaches that optimize detection and management.

Collapse

Sarmet M, Kabani A, Coelho L, Dos Reis SS, Zeredo JL, Mehta AK. The use of natural language processing in palliative care research: A scoping review. Palliat Med 2023;37:275-290. [PMID: 36495082 DOI: 10.1177/02692163221141969] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]

Abstract

BACKGROUND

Natural language processing has been increasingly used in palliative care research over the last 5 years for its versatility and accuracy.

AIM

To evaluate and characterize natural language processing use in palliative care research, including the most commonly used natural language processing software and computational methods, data sources, trends in natural language processing use over time, and palliative care topics addressed.

DESIGN

A scoping review using the framework by Arksey and O'Malley and the updated recommendations proposed by Levac et al. was conducted.

SOURCES

PubMed, Web of Science, Embase, Scopus, and IEEE Xplore databases were searched for palliative care studies that utilized natural language processing tools. Data on study characteristics and natural language processing instruments used were collected and relevant palliative care topics were identified.

RESULTS

197 relevant references were identified. Of these, 82 were included after full-text review. Studies were published in 48 different journals from 2007 to 2022. The average sample size was 21,541 (median 435). Thirty-two different natural language processing software and 33 machine-learning methods were identified. Nine main sources for data processing and 15 main palliative care topics across the included studies were identified. The most frequent topic was mortality and prognosis prediction. We also identified a trend where natural language processing was frequently used in analyzing clinical serious illness conversations extracted from audio recordings.

CONCLUSIONS

We found 82 papers on palliative care using natural language processing methods for a wide-range of topics and sources of data that could expand the use of this methodology. We encourage researchers to consider incorporating this cutting-edge research methodology in future studies to improve published palliative care data.

Collapse

Kotevski DP, Smee RI, Vajdic CM, Field M. Empirical comparison of routinely collected electronic health record data for head and neck cancer-specific survival in machine-learnt prognostic models. Head Neck 2023;45:365-379. [PMID: 36369773 PMCID: PMC10100433 DOI: 10.1002/hed.27241] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2022] [Revised: 09/21/2022] [Accepted: 11/02/2022] [Indexed: 11/13/2022] Open

Nunez JJ, Leung B, Ho C, Bates AT, Ng RT. Predicting the Survival of Patients With Cancer From Their Initial Oncology Consultation Document Using Natural Language Processing. JAMA Netw Open 2023;6:e230813. [PMID: 36848085 PMCID: PMC9972192 DOI: 10.1001/jamanetworkopen.2023.0813] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 03/01/2023] Open

Abstract

IMPORTANCE

Predicting short- and long-term survival of patients with cancer may improve their care. Prior predictive models either use data with limited availability or predict the outcome of only 1 type of cancer.

OBJECTIVE

To investigate whether natural language processing can predict survival of patients with general cancer from a patient's initial oncologist consultation document.

DESIGN, SETTING, AND PARTICIPANTS

This retrospective prognostic study used data from 47 625 of 59 800 patients who started cancer care at any of the 6 BC Cancer sites located in the province of British Columbia between April 1, 2011, and December 31, 2016. Mortality data were updated until April 6, 2022, and data were analyzed from update until September 30, 2022. All patients with a medical or radiation oncologist consultation document generated within 180 days of diagnosis were included; patients seen for multiple cancers were excluded.

EXPOSURES

Initial oncologist consultation documents were analyzed using traditional and neural language models.

MAIN OUTCOMES AND MEASURES

The primary outcome was the performance of the predictive models, including balanced accuracy and receiver operating characteristics area under the curve (AUC). The secondary outcome was investigating what words the models used.

RESULTS

Of the 47 625 patients in the sample, 25 428 (53.4%) were female and 22 197 (46.6%) were male, with a mean (SD) age of 64.9 (13.7) years. A total of 41 447 patients (87.0%) survived 6 months, 31 143 (65.4%) survived 36 months, and 27 880 (58.5%) survived 60 months, calculated from their initial oncologist consultation. The best models achieved a balanced accuracy of 0.856 (AUC, 0.928) for predicting 6-month survival, 0.842 (AUC, 0.918) for 36-month survival, and 0.837 (AUC, 0.918) for 60-month survival, on a holdout test set. Differences in what words were important for predicting 6- vs 60-month survival were found.

CONCLUSIONS AND RELEVANCE

These findings suggest that models performed comparably with or better than previous models predicting cancer survival and that they may be able to predict survival using readily available data without focusing on 1 cancer type.

Collapse

Kotevski DP, Smee RI, Field M, Broadley K, Vajdic CM. The Utility of Oncology Information Systems for Prognostic Modelling in Head and Neck Cancer. J Med Syst 2023;47:9. [PMID: 36640212 PMCID: PMC9840592 DOI: 10.1007/s10916-023-01907-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2022] [Accepted: 01/03/2023] [Indexed: 01/15/2023]

Abstract

Cancer centres rely on electronic information in oncology information systems (OIS) to guide patient care. We investigated the completeness and accuracy of routinely collected head and neck cancer (HNC) data sourced from an OIS for suitability in prognostic modelling and other research. Three hundred and fifty-three adults diagnosed from 2000 to 2017 with head and neck squamous cell carcinoma, treated with radiotherapy, were eligible. Thirteen clinically relevant variables in HNC prognosis were extracted from a single-centre OIS and compared to that compiled separately in a research dataset. These two datasets were compared for agreement using Cohen's kappa coefficient for categorical variables, and intraclass correlation coefficients for continuous variables. Research data was 96% complete compared to 84% for OIS data. Agreement was perfect for gender (κ = 1.000), high for age (κ = 0.993), site (κ = 0.992), T (κ = 0.851) and N (κ = 0.812) stage, radiotherapy dose (κ = 0.889), fractions (κ = 0.856), and duration (κ = 0.818), and chemotherapy treatment (κ = 0.871), substantial for overall stage (κ = 0.791) and vital status (κ = 0.689), moderate for grade (κ = 0.547), and poor for performance status (κ = 0.110). Thirty-one other variables were poorly captured and could not be statistically compared. Documentation of clinical information within the OIS for HNC patients is routine practice; however, OIS data was less correct and complete than data collected for research purposes. Substandard collection of routine data may hinder advancements in patient care. Improved data entry, integration with clinical activities and workflows, system usability, data dictionaries, and training are necessary for OIS data to generate robust research. Data mining from clinical documents may supplement structured data collection.

Collapse

Kotevski DP, Smee RI, Vajdic CM, Field M. Machine Learning and Nomogram Prognostic Modeling for 2-Year Head and Neck Cancer-Specific Survival Using Electronic Health Record Data: A Multisite Study. JCO Clin Cancer Inform 2023;7:e2200128. [PMID: 36596211 DOI: 10.1200/cci.22.00128] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023] Open

Abstract

PURPOSE

There is limited knowledge of the prediction of 2-year cancer-specific survival (CSS) in the head and neck cancer (HNC) population. The aim of this study is to develop and validate machine learning models and a nomogram for the prediction of 2-year CSS in patients with HNC using real-world data collected by major teaching and tertiary referral hospitals in New South Wales (NSW), Australia.

MATERIALS AND METHODS

Data collected in oncology information systems at multiple NSW Cancer Centres were extracted for 2,953 eligible adults diagnosed between 2000 and 2017 with squamous cell carcinoma of the head and neck. Death data were sourced from the National Death Index using record linkage. Machine learning and Cox regression/nomogram models were developed and internally validated in Python and R, respectively.

RESULTS

Machine learning models demonstrated highest performance (C-index) in the larynx and nasopharynx cohorts (0.82), followed by the oropharynx (0.79) and the hypopharynx and oral cavity cohorts (0.73). In the whole HNC population, C-indexes of 0.79 and 0.70 and Brier scores of 0.10 and 0.27 were reported for the machine learning and nomogram model, respectively. Cox regression analysis identified age, T and N classification, and time-corrected biologic equivalent dose in two gray fractions as independent prognostic factors for 2-year CSS. N classification was the most important feature used for prediction in the machine learning model followed by age.

CONCLUSION

Machine learning and nomogram analysis predicted 2-year CSS with high performance using routinely collected and complete clinical information extracted from oncology information systems. These models function as visual decision-making tools to guide radiotherapy treatment decisions and provide insight into the prediction of survival outcomes in patients with HNC.

Collapse

Fang C, Markuzon N, Patel N, Rueda JD. Natural Language Processing for Automated Classification of Qualitative Data From Interviews of Patients With Cancer. VALUE IN HEALTH : THE JOURNAL OF THE INTERNATIONAL SOCIETY FOR PHARMACOECONOMICS AND OUTCOMES RESEARCH 2022;25:1995-2002. [PMID: 35840523 DOI: 10.1016/j.jval.2022.06.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/30/2021] [Revised: 05/19/2022] [Accepted: 06/12/2022] [Indexed: 06/15/2023]

Abstract

OBJECTIVES

This study sought to explore the use of novel natural language processing (NLP) methods for classifying unstructured, qualitative textual data from interviews of patients with cancer to identify patient-reported symptoms and impacts on quality of life.

METHODS

We tested the ability of 4 NLP models to accurately classify text from interview transcripts as "symptom," "quality of life impact," and "other." Interview data sets from patients with hepatocellular carcinoma (HCC) (n = 25), biliary tract cancer (BTC) (n = 23), and gastric cancer (n = 24) were used. Models were cross-validated with transcript subsets designated for training, validation, and testing. Multiclass classification performance of the 4 models was evaluated at paragraph and sentence level using the HCC testing data set and analyzed by the one-versus-rest technique quantified by the receiver operating characteristic area under the curve (ROC AUC) score.

RESULTS

NLP models accurately classified multiclass text from patient interviews. The Bidirectional Encoder Representations from Transformers model generally outperformed all other models at paragraph and sentence level. The highest predictive performance of the Bidirectional Encoder Representations from Transformers model was observed using the HCC data set to train and BTC data set to test (mean ROC AUC, 0.940 [SD 0.028]), with similarly high predictive performance using balanced and imbalanced training data sets from BTC and gastric cancer populations.

CONCLUSIONS

NLP models were accurate in predicting multiclass classification of text from interviews of patients with cancer, with most surpassing 0.9 ROC AUC at paragraph level. NLP may be a useful tool for scaling up processing of patient interviews in clinical studies and, thus, could serve to facilitate patient input into drug development and improving patient care.

Collapse

Wang L, Fu S, Wen A, Ruan X, He H, Liu S, Moon S, Mai M, Riaz IB, Wang N, Yang P, Xu H, Warner JL, Liu H. Assessment of Electronic Health Record for Cancer Research and Patient Care Through a Scoping Review of Cancer Natural Language Processing. JCO Clin Cancer Inform 2022;6:e2200006. [PMID: 35917480 PMCID: PMC9470142 DOI: 10.1200/cci.22.00006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2022] [Revised: 03/18/2022] [Accepted: 06/15/2022] [Indexed: 11/20/2022] Open

Lindvall C, Deng CY, Agaronnik ND, Kwok A, Samineni S, Umeton R, Mackie-Jenkins W, Kehl KL, Tulsky JA, Enzinger AC. Deep Learning for Cancer Symptoms Monitoring on the Basis of Electronic Health Record Unstructured Clinical Notes. JCO Clin Cancer Inform 2022;6:e2100136. [PMID: 35714301 PMCID: PMC9232368 DOI: 10.1200/cci.21.00136] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open

Abstract

PURPOSE

Symptoms are vital outcomes for cancer clinical trials, observational research, and population-level surveillance. Patient-reported outcomes (PROs) are valuable for monitoring symptoms, yet there are many challenges to collecting PROs at scale. We sought to develop, test, and externally validate a deep learning model to extract symptoms from unstructured clinical notes in the electronic health record.

METHODS

We randomly selected 1,225 outpatient progress notes from among patients treated at the Dana-Farber Cancer Institute between January 2016 and December 2019 and used 1,125 notes as our training/validation data set and 100 notes as our test data set. We evaluated the performance of 10 deep learning models for detecting 80 symptoms included in the National Cancer Institute's Patient-Reported Outcomes version of the Common Terminology Criteria for Adverse Events (PRO-CTCAE) framework. Model performance as compared with manual chart abstraction was assessed using standard metrics, and the highest performer was externally validated on a sample of 100 physician notes from a different clinical context.

RESULTS

In our training and test data sets, 75 of the 80 candidate symptoms were identified. The ELECTRA-small model had the highest performance for symptom identification at the token level (ie, at the individual symptom level), with an F1 of 0.87 and a processing time of 3.95 seconds per note. For the 10 most common symptoms in the test data set, the F1 score ranged from 0.98 for anxious to 0.86 for fatigue. For external validation of the same symptoms, the note-level performance ranged from F1 = 0.97 for diarrhea and dizziness to F1 = 0.73 for swelling.

CONCLUSION

Training a deep learning model to identify a wide range of electronic health record-documented symptoms relevant to cancer care is feasible. This approach could be used at the health system scale to complement to electronic PROs.

Collapse

Shreve JT, Khanani SA, Haddad TC. Artificial Intelligence in Oncology: Current Capabilities, Future Opportunities, and Ethical Considerations. Am Soc Clin Oncol Educ Book 2022;42:1-10. [PMID: 35687826 DOI: 10.1200/edbk_350652] [Citation(s) in RCA: 18] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]

Abstract

The promise of highly personalized oncology care using artificial intelligence (AI) technologies has been forecasted since the emergence of the field. Cumulative advances across the science are bringing this promise to realization, including refinement of machine learning- and deep learning algorithms; expansion in the depth and variety of databases, including multiomics; and the decreased cost of massively parallelized computational power. Examples of successful clinical applications of AI can be found throughout the cancer continuum and in multidisciplinary practice, with computer vision-assisted image analysis in particular having several U.S. Food and Drug Administration-approved uses. Techniques with emerging clinical utility include whole blood multicancer detection from deep sequencing, virtual biopsies, natural language processing to infer health trajectories from medical notes, and advanced clinical decision support systems that combine genomics and clinomics. Substantial issues have delayed broad adoption, with data transparency and interpretability suffering from AI's "black box" mechanism, and intrinsic bias against underrepresented persons limiting the reproducibility of AI models and perpetuating health care disparities. Midfuture projections of AI maturation involve increasing a model's complexity by using multimodal data elements to better approximate an organic system. Far-future positing includes living databases that accumulate all aspects of a person's health into discrete data elements; this will fuel highly convoluted modeling that can tailor treatment selection, dose determination, surveillance modality and schedule, and more. The field of AI has had a historical dichotomy between its proponents and detractors. The successful development of recent applications, and continued investment in prospective validation that defines their impact on multilevel outcomes, has established a momentum of accelerated progress.

Collapse

Kehl KL, Xu W, Gusev A, Bakouny Z, Choueiri TK, Riaz IB, Elmarakeby H, Van Allen EM, Schrag D. Artificial intelligence-aided clinical annotation of a large multi-cancer genomic dataset. Nat Commun 2021;12:7304. [PMID: 34911934 PMCID: PMC8674229 DOI: 10.1038/s41467-021-27358-6] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2021] [Accepted: 11/16/2021] [Indexed: 02/08/2023] Open

Ma C, Sridharan M, Al-Sayegh H, Li A, Guo D, Auclair M, Kuragayala V, Bandaru C, Milne D, Cruse H, Beaudoin R, Orechia J, Bickel J, London WB. Building a Harmonized Datamart by Integrating Cross-Institutional Systems of Clinical, Outcome, and Genomic Data: The Pediatric Patient Informatics Platform (PPIP). JCO Clin Cancer Inform 2021;5:202-215. [PMID: 33591797 DOI: 10.1200/cci.20.00083] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open

Abstract

PURPOSE

Siloed electronic medical data limits utility and accessibility. At the Dana-Farber/Boston Children's Cancer and Blood Disorders Center, cross-institutional data were inconsistent and difficult to access. To unify data for clinical operations, administration, and research, we developed the Pediatric Patient Informatics Platform (PPIP), an integrated datamart harmonizing multiple source systems across two institutions into a common technology.

PATIENTS AND METHODS

Starting in 2009, user requirements were gathered and data sources were prioritized. Project teams, including biostatisticians, database developers, and an external contractor, were formed. Read-access to source systems was established. The 3-layer PPIP architecture was developed: STAGING, a near-exact copy of source data; INTEGRATION, where data were reorganized into domains; and, CONSUMPTION, where data were optimized for rapid retrieval. The diverse systems were integrated into a common IBM Netezza technology. Data filters were defined to accurately capture the Center's patients, and derived data items were created for harmonization across sources. An interactive online query tool, PPIP360, was developed using Microstrategy Analytics.

RESULTS

Driven by scientific objectives, the PPIP datamart was created, including 33,674 patients, 2,983 protocols, and 3.6 million patient visits from 14 source databases, 164 source tables, and 2,622 source data items. The PPIP360 has 605 data items and 33 metrics across 11 reports and dashboards. Dana-Farber and Boston Children's established a legal data-sharing agreement. The PPIP has supported hundreds of faculty, staff, and projects, including planning clinical trials and informing strategic planning.

CONCLUSION

The PPIP has successfully harmonized and integrated diagnostic, demographic, laboratory, treatment, clinical outcome, pathology, transplant, meta-protocol, and -omics data, for efficient, daily operational and research activities at Dana-Farber/Boston Children's Cancer and Blood Disorders Center, and future external sharing.

Collapse

Alkaitis MS, Agrawal MN, Riely GJ, Razavi P, Sontag D. Automated NLP Extraction of Clinical Rationale for Treatment Discontinuation in Breast Cancer. JCO Clin Cancer Inform 2021;5:550-560. [PMID: 33989016 DOI: 10.1200/cci.20.00139] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open

Abstract

PURPOSE

Key oncology end points are not routinely encoded into electronic medical records (EMRs). We assessed whether natural language processing (NLP) can abstract treatment discontinuation rationale from unstructured EMR notes to estimate toxicity incidence and progression-free survival (PFS).

METHODS

We constructed a retrospective cohort of 6,115 patients with early-stage and 701 patients with metastatic breast cancer initiating care at Memorial Sloan Kettering Cancer Center from 2008 to 2019. Each cohort was divided into training (70%), validation (15%), and test (15%) subsets. Human abstractors identified the clinical rationale associated with treatment discontinuation events. Concatenated EMR notes were used to train high-dimensional logistic regression and convolutional neural network models. Kaplan-Meier analyses were used to compare toxicity incidence and PFS estimated by our NLP models to estimates generated by manual labeling and time-to-treatment discontinuation (TTD).

RESULTS

Our best high-dimensional logistic regression models identified toxicity events in early-stage patients with an area under the curve of the receiver-operator characteristic of 0.857 ± 0.014 (standard deviation) and progression events in metastatic patients with an area under the curve of 0.752 ± 0.027 (standard deviation). NLP-extracted toxicity incidence and PFS curves were not significantly different from manually extracted curves (P = .95 and P = .67, respectively). By contrast, TTD overestimated toxicity in early-stage patients (P < .001) and underestimated PFS in metastatic patients (P < .001). Additionally, we tested an extrapolation approach in which 20% of the metastatic cohort were labeled manually, and NLP algorithms were used to abstract the remaining 80%. This extrapolated outcomes approach resolved PFS differences between receptor subtypes (P < .001 for hormone receptor+/human epidermal growth factor receptor 2- v human epidermal growth factor receptor 2+ v triple-negative) that could not be resolved with TTD.

CONCLUSION

NLP models are capable of abstracting treatment discontinuation rationale with minimal manual labeling.

Collapse

Ronquillo JG, Lester WT. Practical Aspects of Implementing and Applying Health Care Cloud Computing Services and Informatics to Cancer Clinical Trial Data. JCO Clin Cancer Inform 2021;5:826-832. [PMID: 34383582 DOI: 10.1200/cci.21.00018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open

Abstract

PURPOSE

Cloud computing has led to dramatic growth in the volume, variety, and velocity of cancer data. However, cloud platforms and services present new challenges for cancer research, particularly in understanding the practical tradeoffs between cloud performance, cost, and complexity. The goal of this study was to describe the practical challenges when using a cloud-based service to improve the cancer clinical trial matching process.

METHODS

We collected information for all interventional cancer clinical trials from ClinicalTrials.gov and used the Google Cloud Healthcare Natural Language Application Programming Interface (API) to analyze clinical trial Title and Eligibility Criteria text. An informatics pipeline leveraging interoperability standards summarized the distribution of cancer clinical trials, genes, laboratory tests, and medications extracted from cloud-based entity analysis.

RESULTS

There were a total of 38,851 cancer-related clinical trials found in this study, with the distribution of cancer categories extracted from Title text significantly different than in ClinicalTrials.gov (P < .001). Cloud-based entity analysis of clinical trial criteria identified a total of 949 genes, 1,782 laboratory tests, 2,086 medications, and 4,902 National Cancer Institute Thesaurus terms, with estimated detection accuracies ranging from 12.8% to 89.9%. A total of 77,702 API calls processed an estimated 167,179 text records, which took a total of 1,979 processing-minutes (33.0 processing-hours), or approximately 1.5 seconds per API call.

CONCLUSION

Current general-purpose cloud health care tools-like the Google service in this study-should not be used for automated clinical trial matching unless they can perform effective extraction and classification of the clinical, genetic, and medication concepts central to precision oncology research. A strong understanding of the practical aspects of cloud computing will help researchers effectively navigate the vast data ecosystems in cancer research.

Collapse

Kehl KL, Riely GJ, Lepisto EM, Lavery JA, Warner JL, LeNoue-Newton ML, Sweeney SM, Rudolph JE, Brown S, Yu C, Bedard PL, Schrag D, Panageas KS. Correlation Between Surrogate End Points and Overall Survival in a Multi-institutional Clinicogenomic Cohort of Patients With Non-Small Cell Lung or Colorectal Cancer. JAMA Netw Open 2021;4:e2117547. [PMID: 34309669 PMCID: PMC8314138 DOI: 10.1001/jamanetworkopen.2021.17547] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 02/01/2023] Open

Abstract

IMPORTANCE

Contemporary observational cancer research requires associating genomic biomarkers with reproducible end points; overall survival (OS) is a key end point, but interpretation can be challenging when multiple lines of therapy and prolonged survival are common. Progression-free survival (PFS), time to treatment discontinuation (TTD), and time to next treatment (TTNT) are alternative end points, but their utility as surrogates for OS in real-world clinicogenomic data sets has not been well characterized.

OBJECTIVE

To measure correlations between candidate surrogate end points and OS in a multi-institutional clinicogenomic data set.

DESIGN, SETTING, AND PARTICIPANTS

A retrospective cohort study was conducted of patients with non-small cell lung cancer (NSCLC) or colorectal cancer (CRC) whose tumors were genotyped at 4 academic centers from January 1, 2014, to December 31, 2017, and who initiated systemic therapy for advanced disease. Patients were followed up through August 31, 2020 (NSCLC), and October 31, 2020 (CRC). Statistical analyses were conducted on January 5, 2021.

EXPOSURES

Candidate surrogate end points included TTD; TTNT; PFS based on imaging reports only; PFS based on medical oncologist ascertainment only; PFS based on either imaging or medical oncologist ascertainment, whichever came first; and PFS defined by a requirement that both imaging and medical oncologist ascertainment have indicated progression.

MAIN OUTCOMES AND MEASURES

The primary outcome was the correlation between candidate surrogate end points and OS.

RESULTS

There were 1161 patients with NSCLC (648 women [55.8%]; mean [SD] age, 63 [11] years) and 1150 with CRC (647 men [56.3%]; mean [SD] age, 54 [12] years) identified for analysis. Progression-free survival based on both imaging and medical oncologist documentation was most correlated with OS (NSCLC: ρ = 0.76; 95% CI, 0.73-0.79; CRC: ρ = 0.73; 95% CI, 0.69-0.75). Time to treatment discontinuation was least associated with OS (NSCLC: ρ = 0.45; 95% CI, 0.40-0.50; CRC: ρ = 0.13; 95% CI, 0.06-0.19). Time to next treatment was modestly associated with OS (NSCLC: ρ = 0.60; 0.55-0.64; CRC: ρ = 0.39; 95% CI, 0.32-0.46).

CONCLUSIONS AND RELEVANCE

This cohort study suggests that PFS based on both a radiologist and a treating oncologist determining that a progression event has occurred was the surrogate end point most highly correlated with OS for analysis of observational clinicogenomic data.

Collapse

Affiliation(s)

Kenneth L. Kehl Department of Medical Oncology, Division of Population Sciences, Dana-Farber Cancer Institute, Harvard Medical School, Boston, Massachusetts
Gregory J. Riely Department of Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer Center, New York, New York
Eva M. Lepisto Department of Medical Oncology, Division of Population Sciences, Dana-Farber Cancer Institute, Harvard Medical School, Boston, Massachusetts
Jessica A. Lavery Department of Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer Center, New York, New York
Jeremy L. Warner Department of Medicine, Division of Hematology/Oncology, Vanderbilt-Ingram Cancer Center, Vanderbilt University Medical Center, Nashville, Tennessee Department of Biomedical Informatics, Vanderbilt-Ingram Cancer Center, Vanderbilt University Medical Center, Nashville, Tennessee
Michele L. LeNoue-Newton Vanderbilt-Ingram Cancer Center, Vanderbilt University Medical Center, Nashville, Tennessee
Shawn M. Sweeney American Association for Cancer Research, Philadelphia, Pennsylvania
Julia E. Rudolph Department of Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer Center, New York, New York
Samantha Brown Department of Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer Center, New York, New York
Celeste Yu Division of Medical Oncology & Hematology, Princess Margaret Cancer Centre/University Health Network, Toronto, Ontario, Canada
Philippe L. Bedard Division of Medical Oncology & Hematology, Princess Margaret Cancer Centre/University Health Network, Toronto, Ontario, Canada Department of Medicine, University of Toronto, Toronto, Ontario, Canada
Deborah Schrag Department of Medical Oncology, Division of Population Sciences, Dana-Farber Cancer Institute, Harvard Medical School, Boston, Massachusetts Associate Editor, JAMA
Katherine S. Panageas Department of Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer Center, New York, New York

Collapse

Saini KS, Twelves C. Determining lines of therapy in patients with solid cancers: a proposed new systematic and comprehensive framework. Br J Cancer 2021;125:155-163. [PMID: 33850304 PMCID: PMC8292475 DOI: 10.1038/s41416-021-01319-8] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2020] [Revised: 01/25/2021] [Accepted: 02/10/2021] [Indexed: 12/18/2022] Open

Momeni-Boroujeni A, Dahoud W, Vanderbilt CM, Chiang S, Murali R, Rios-Doria EV, Alektiar KM, Aghajanian C, Abu-Rustum NR, Ladanyi M, Ellenson LH, Weigelt B, Soslow RA. Clinicopathologic and Genomic Analysis of TP53-Mutated Endometrial Carcinomas. Clin Cancer Res 2021;27:2613-2623. [PMID: 33602681 PMCID: PMC8530276 DOI: 10.1158/1078-0432.ccr-20-4436] [Citation(s) in RCA: 41] [Impact Index Per Article: 13.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2020] [Revised: 01/16/2021] [Accepted: 02/11/2021] [Indexed: 11/16/2022]

Abstract

PURPOSE

Copy number-high endometrial carcinomas were described by The Cancer Genome Atlas as high-grade endometrioid and serous cancers showing frequent copy-number alterations (CNA), low mutational burden (i.e., non-hypermutant), near-universal TP53 mutation, and unfavorable clinical outcomes. We sought to investigate and compare the clinicopathologic and molecular characteristics of non-hypermutant TP53-altered endometrial carcinomas of four histologic types.

EXPERIMENTAL DESIGN

TP53-mutated endometrial carcinomas, defined as TP53-mutant tumors lacking microsatellite instability or pathogenic POLE mutations, were identified (n = 238) in a cohort of 1,239 endometrial carcinomas subjected to clinical massively parallel sequencing of 410-468 cancer-related genes. Somatic mutations and CNAs (n = 238), and clinicopathologic features were determined (n = 185, initial treatment planning at our institution).

RESULTS

TP53-mutated endometrial carcinomas encompassed uterine serous (n = 102, 55.1%), high-grade endometrial carcinoma with ambiguous features/not otherwise specified (EC-NOS; n = 44, 23.8%), endometrioid carcinomas of all tumor grades (n = 28, 15.1%), and clear cell carcinomas (n = 11, 5.9%). PTEN mutations were significantly more frequent in endometrioid carcinomas, SPOP mutations in clear cell carcinomas, and CCNE1 amplification in serous carcinomas/EC-NOS; however, none of these genomic alterations were exclusive to any given histologic type. ERBB2 amplification was present at similar frequencies across TP53-mutated histologic types (7.7%-18.6%). Although overall survival was similar across histologic types, serous carcinomas presented more frequently at stage IV, had more persistent and/or recurrent disease, and reduced disease-free survival.

CONCLUSIONS

TP53-mutated endometrial carcinomas display clinical and molecular similarities across histologic subtypes. Our data provide evidence to suggest performance of ERBB2 assessment in all TP53-mutated endometrial carcinomas. Given the distinct clinical features of serous carcinomas, histologic classification continues to be relevant.

Collapse

Evolution of Hematology Clinical Trial Adverse Event Reporting to Improve Care Delivery. Curr Hematol Malig Rep 2021;16:126-131. [PMID: 33786724 DOI: 10.1007/s11899-021-00627-3] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/21/2021] [Indexed: 10/21/2022]

Sorin V, Barash Y, Konen E, Klang E. Deep-learning natural language processing for oncological applications. Lancet Oncol 2020;21:1553-1556. [DOI: 10.1016/s1470-2045(20)30615-x] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2020] [Accepted: 10/05/2020] [Indexed: 10/22/2022]