1
|
Sarwal D, Wang L, Gandhi S, Sagheb Hossein Pour E, Janssens LP, Delgado AM, Doering KA, Mishra AK, Greenwood JD, Liu H, Majumder S. Identification of pancreatic cancer risk factors from clinical notes using natural language processing. Pancreatology 2024:S1424-3903(24)00075-9. [PMID: 38693040 DOI: 10.1016/j.pan.2024.03.016] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/04/2024] [Revised: 03/20/2024] [Accepted: 03/23/2024] [Indexed: 05/03/2024]
Abstract
OBJECTIVES Screening for pancreatic ductal adenocarcinoma (PDAC) is considered in high-risk individuals (HRIs) with established PDAC risk factors, such as family history and germline mutations in PDAC susceptibility genes. Accurate assessment of risk factor status is provider knowledge-dependent and requires extensive manual chart review by experts. Natural Language Processing (NLP) has shown promise in automated data extraction from the electronic health record (EHR). We aimed to use NLP for automated extraction of PDAC risk factors from unstructured clinical notes in the EHR. METHODS We first developed rule-based NLP algorithms to extract PDAC risk factors at the document-level, using an annotated corpus of 2091 clinical notes. Next, we further improved the NLP algorithms using a cohort of 1138 patients through patient-level training, validation, and testing, with comparison against a pre-specified reference standard. To minimize false-negative results we prioritized algorithm recall. RESULTS In the test set (n = 807), the NLP algorithms achieved a recall of 0.933, precision of 0.790, and F1-score of 0.856 for family history of PDAC. For germline genetic mutations, the algorithm had a high recall of 0.851, while precision and F1-score were lower at 0.350 and 0.496 respectively. Most false positives for germline mutations resulted from erroneous recognition of tissue mutations. CONCLUSIONS Rule-based NLP algorithms applied to unstructured clinical notes are highly sensitive for automated identification of PDAC risk factors. Further validation in a large primary-care patient population is warranted to assess real-world utility in identifying HRIs for pancreatic cancer screening.
Collapse
Affiliation(s)
- Dhruv Sarwal
- Department of Gastroenterology and Hepatology, Mayo Clinic, Rochester, MN, USA
| | - Liwei Wang
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, MN, USA
| | - Sonal Gandhi
- Department of Gastroenterology and Hepatology, Mayo Clinic, Rochester, MN, USA
| | | | - Laurens P Janssens
- Department of Gastroenterology and Hepatology, Mayo Clinic, Rochester, MN, USA
| | - Adriana M Delgado
- Department of Gastroenterology and Hepatology, Mayo Clinic, Rochester, MN, USA
| | - Karen A Doering
- Department of Gastroenterology and Hepatology, Mayo Clinic, Rochester, MN, USA
| | - Anup Kumar Mishra
- Department of Gastroenterology and Hepatology, Mayo Clinic, Rochester, MN, USA
| | - Jason D Greenwood
- Department of Family Medicine, Mayo Clinic, Rochester, MN, USA; Department of Laboratory Medicine and Pathology, Mayo Clinic, Rochester, MN, USA
| | - Hongfang Liu
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, MN, USA
| | - Shounak Majumder
- Department of Gastroenterology and Hepatology, Mayo Clinic, Rochester, MN, USA.
| |
Collapse
|
2
|
Daher H, Punchayil SA, Ismail AAE, Fernandes RR, Jacob J, Algazzar MH, Mansour M. Advancements in Pancreatic Cancer Detection: Integrating Biomarkers, Imaging Technologies, and Machine Learning for Early Diagnosis. Cureus 2024; 16:e56583. [PMID: 38646386 PMCID: PMC11031195 DOI: 10.7759/cureus.56583] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/20/2024] [Indexed: 04/23/2024] Open
Abstract
Artificial intelligence (AI) has come to play a pivotal role in revolutionizing medical practices, particularly in the field of pancreatic cancer detection and management. As a leading cause of cancer-related deaths, pancreatic cancer warrants innovative approaches due to its typically advanced stage at diagnosis and dismal survival rates. Present detection methods, constrained by limitations in accuracy and efficiency, underscore the necessity for novel solutions. AI-driven methodologies present promising avenues for enhancing early detection and prognosis forecasting. Through the analysis of imaging data, biomarker profiles, and clinical information, AI algorithms excel in discerning subtle abnormalities indicative of pancreatic cancer with remarkable precision. Moreover, machine learning (ML) algorithms facilitate the amalgamation of diverse data sources to optimize patient care. However, despite its huge potential, the implementation of AI in pancreatic cancer detection faces various challenges. Issues such as the scarcity of comprehensive datasets, biases in algorithm development, and concerns regarding data privacy and security necessitate thorough scrutiny. While AI offers immense promise in transforming pancreatic cancer detection and management, ongoing research and collaborative efforts are indispensable in overcoming technical hurdles and ethical dilemmas. This review delves into the evolution of AI, its application in pancreatic cancer detection, and the challenges and ethical considerations inherent in its integration.
Collapse
Affiliation(s)
- Hisham Daher
- Internal Medicine, University of Debrecen, Debrecen, HUN
| | - Sneha A Punchayil
- Internal Medicine, University Hospital of North Tees, Stockton-on-Tees, GBR
| | | | | | - Joel Jacob
- General Medicine, Diana Princess of Wales Hospital, Grimsby, GBR
| | | | - Mohammad Mansour
- General Medicine, University of Debrecen, Debrecen, HUN
- General Medicine, Jordan University Hospital, Amman, JOR
| |
Collapse
|
3
|
Claridge H, Price CA, Ali R, Cooke EA, de Lusignan S, Harvey-Sullivan A, Hodges C, Khalaf N, O'Callaghan D, Stunt A, Thomas SA, Thomson J, Lemanska A. Determining the feasibility of calculating pancreatic cancer risk scores for people with new-onset diabetes in primary care (DEFEND PRIME): study protocol. BMJ Open 2024; 14:e079863. [PMID: 38262635 PMCID: PMC10806670 DOI: 10.1136/bmjopen-2023-079863] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/13/2023] [Accepted: 01/04/2024] [Indexed: 01/25/2024] Open
Abstract
INTRODUCTION Worldwide, pancreatic cancer has a poor prognosis. Early diagnosis may improve survival by enabling curative treatment. Statistical and machine learning diagnostic prediction models using risk factors such as patient demographics and blood tests are being developed for clinical use to improve early diagnosis. One example is the Enriching New-onset Diabetes for Pancreatic Cancer (ENDPAC) model, which employs patients' age, blood glucose and weight changes to provide pancreatic cancer risk scores. These values are routinely collected in primary care in the UK. Primary care's central role in cancer diagnosis makes it an ideal setting to implement ENDPAC but it has yet to be used in clinical settings. This study aims to determine the feasibility of applying ENDPAC to data held by UK primary care practices. METHODS AND ANALYSIS This will be a multicentre observational study with a cohort design, determining the feasibility of applying ENDPAC in UK primary care. We will develop software to search, extract and process anonymised data from 20 primary care providers' electronic patient record management systems on participants aged 50+ years, with a glycated haemoglobin (HbA1c) test result of ≥48 mmol/mol (6.5%) and no previous abnormal HbA1c results. Software to calculate ENDPAC scores will be developed, and descriptive statistics used to summarise the cohort's demographics and assess data quality. Findings will inform the development of a future UK clinical trial to test ENDPAC's effectiveness for the early detection of pancreatic cancer. ETHICS AND DISSEMINATION This project has been reviewed by the University of Surrey University Ethics Committee and received a favourable ethical opinion (FHMS 22-23151 EGA). Study findings will be presented at scientific meetings and published in international peer-reviewed journals. Participating primary care practices, clinical leads and policy makers will be provided with summaries of the findings.
Collapse
Affiliation(s)
- Hugh Claridge
- School of Health Sciences, Faculty of Health and Medical Sciences, University of Surrey, Guildford, UK
- National Physical Laboratory, Teddington, UK
| | - Claire A Price
- School of Health Sciences, Faculty of Health and Medical Sciences, University of Surrey, Guildford, UK
- National Physical Laboratory, Teddington, UK
| | - Rofique Ali
- Tower Hamlets Network 1 Primary Care Network, London, UK
| | | | - Simon de Lusignan
- Nuffield Department of Primary Care Health Sciences, University of Oxford, Oxford, UK
| | - Adam Harvey-Sullivan
- Tower Hamlets Network 1 Primary Care Network, London, UK
- Centre for Primary Care, Wolfson Institute of Population Health, Queen Mary University of London, London, UK
| | | | - Natalia Khalaf
- Section of Gastroenterology and Hepatology, Department of Medicine, Baylor College of Medicine, Center for Innovations in Quality, Effectiveness, and Safety (IQuESt), Michael E. DeBakey Veterans Affairs Medical Center, Houston, Texas, USA
| | | | - Ali Stunt
- Pancreatic Cancer Action, Oakhanger, Hampshire, UK
| | | | | | - Agnieszka Lemanska
- School of Health Sciences, Faculty of Health and Medical Sciences, University of Surrey, Guildford, UK
- National Physical Laboratory, Teddington, UK
| |
Collapse
|
4
|
Rawlani P, Ghosh NK, Kumar A. Role of artificial intelligence in the characterization of indeterminate pancreatic head mass and its usefulness in preoperative diagnosis. Artif Intell Gastroenterol 2023; 4:48-63. [DOI: 10.35712/aig.v4.i3.48] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/27/2023] [Revised: 09/11/2023] [Accepted: 10/08/2023] [Indexed: 12/07/2023] Open
Abstract
Artificial intelligence (AI) has been used in various fields of day-to-day life and its role in medicine is immense. Understanding of oncology has been improved with the introduction of AI which helps in diagnosis, treatment planning, management, prognosis, and follow-up. It also helps to identify high-risk groups who can be subjected to timely screening for early detection of malignant conditions. It is more important in pancreatic cancer as it is one of the major causes of cancer-related deaths worldwide and there are no specific early features (clinical and radiological) for diagnosis. With improvement in imaging modalities (computed tomography, magnetic resonance imaging, endoscopic ultrasound), most often clinicians were being challenged with lesions that were difficult to diagnose with human competence. AI has been used in various other branches of medicine to differentiate such indeterminate lesions including the thyroid gland, breast, lungs, liver, adrenal gland, kidney, etc. In the case of pancreatic cancer, the role of AI has been explored and is still ongoing. This review article will focus on how AI can be used to diagnose pancreatic cancer early or differentiate it from benign pancreatic lesions, therefore, management can be planned at an earlier stage.
Collapse
Affiliation(s)
- Palash Rawlani
- Department of Surgical Gastroenterology, Sanjay Gandhi Post Graduate Institute of Medical Sciences, Lucknow 226014, Uttar Pradesh, India
| | - Nalini Kanta Ghosh
- Department of Surgical Gastroenterology, Sanjay Gandhi Post Graduate Institute of Medical Sciences, Lucknow 226014, Uttar Pradesh, India
| | - Ashok Kumar
- Department of Surgical Gastroenterology, Sanjay Gandhi Post Graduate Institute of Medical Sciences, Lucknow 226014, Uttar Pradesh, India
| |
Collapse
|
5
|
Jia K, Kundrot S, Palchuk MB, Warnick J, Haapala K, Kaplan ID, Rinard M, Appelbaum L. A pancreatic cancer risk prediction model (Prism) developed and validated on large-scale US clinical data. EBioMedicine 2023; 98:104888. [PMID: 38007948 PMCID: PMC10755107 DOI: 10.1016/j.ebiom.2023.104888] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2023] [Revised: 11/03/2023] [Accepted: 11/10/2023] [Indexed: 11/28/2023] Open
Abstract
BACKGROUND Pancreatic Duct Adenocarcinoma (PDAC) screening can enable early-stage disease detection and long-term survival. Current guidelines use inherited predisposition, with about 10% of PDAC cases eligible for screening. Using Electronic Health Record (EHR) data from a multi-institutional federated network, we developed and validated a PDAC RISk Model (Prism) for the general US population to extend early PDAC detection. METHODS Neural Network (PrismNN) and Logistic Regression (PrismLR) were developed using EHR data from 55 US Health Care Organisations (HCOs) to predict PDAC risk 6-18 months before diagnosis for patients 40 years or older. Model performance was assessed using Area Under the Curve (AUC) and calibration plots. Models were internal-externally validated by geographic location, race, and time. Simulated model deployment evaluated Standardised Incidence Ratio (SIR) and other metrics. FINDINGS With 35,387 PDAC cases, 1,500,081 controls, and 87 features per patient, PrismNN obtained a test AUC of 0.826 (95% CI: 0.824-0.828) (PrismLR: 0.800 (95% CI: 0.798-0.802)). PrismNN's average internal-external validation AUCs were 0.740 for locations, 0.828 for races, and 0.789 (95% CI: 0.762-0.816) for time. At SIR = 5.10 (exceeding the current screening inclusion threshold) in simulated model deployment, PrismNN sensitivity was 35.9% (specificity 95.3%). INTERPRETATION Prism models demonstrated good accuracy and generalizability across diverse populations. PrismNN could find 3.5 times more cases at comparable risk than current screening guidelines. The small number of features provided a basis for model interpretation. Integration with the federated network provided data from a large, heterogeneous patient population and a pathway to future clinical deployment. FUNDING Prevent Cancer Foundation, TriNetX, Boeing, DARPA, NSF, and Aarno Labs.
Collapse
Affiliation(s)
- Kai Jia
- Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA.
| | | | | | | | | | - Irving D Kaplan
- Beth Israel Deaconess Medical Center, Boston, MA, 02215, USA.
| | - Martin Rinard
- Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA.
| | - Limor Appelbaum
- Beth Israel Deaconess Medical Center, Boston, MA, 02215, USA.
| |
Collapse
|
6
|
Li J, Wang X, Cai L, Sun J, Yang Z, Liu W, Wang Z, Lv H. An interpretable deep learning framework for predicting liver metastases in postoperative colorectal cancer patients using natural language processing and clinical data integration. Cancer Med 2023; 12:19337-19351. [PMID: 37694452 PMCID: PMC10557887 DOI: 10.1002/cam4.6523] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2023] [Revised: 08/25/2023] [Accepted: 08/30/2023] [Indexed: 09/12/2023] Open
Abstract
BACKGROUND The significance of liver metastasis (LM) in increasing the risk of death for postoperative colorectal cancer (CRC) patients necessitates innovative approaches to predict LM. AIM Our study presents a novel and significant contribution by developing an interpretable fusion model that effectively integrates both free-text medical record data and structured laboratory data to predict LM in postoperative CRC patients. METHODS We used a robust dataset of 1463 patients and leveraged state-of-the-art natural language processing (NLP) and machine learning techniques to construct a two-layer fusion framework that demonstrates superior predictive performance compared to single modal models. Our innovative two-tier algorithm fuses the results from different data modalities, achieving balanced prediction results on test data and significantly enhancing the predictive ability of the model. To increase interpretability, we employed Shapley additive explanations to elucidate the contributions of free-text clinical data and structured clinical data to the final model. Furthermore, we translated our findings into practical clinical applications by creating a novel NLP score-based nomogram using the top 13 valid predictors identified in our study. RESULTS The proposed fusion models demonstrated superior predictive performance with an accuracy of 80.8%, precision of 80.3%, recall of 80.5%, and an F1 score of 80.8% in predicting LMs. CONCLUSION This fusion model represents a notable advancement in predicting LMs for postoperative CRC patients, offering the potential to enhance patient outcomes and support clinical decision-making.
Collapse
Affiliation(s)
- Jia Li
- Department of RadiologyBeijing Friendship Hospital, Capital Medical UniversityBeijingPeople's Republic of China
| | - Xinghao Wang
- Department of RadiologyBeijing Friendship Hospital, Capital Medical UniversityBeijingPeople's Republic of China
| | - Linkun Cai
- Department of RadiologyBeijing Friendship Hospital, Capital Medical UniversityBeijingPeople's Republic of China
- School of Biological Science and Medical EngineeringBeihang UniversityBeijingPeople's Republic of China
| | - Jing Sun
- Department of RadiologyBeijing Friendship Hospital, Capital Medical UniversityBeijingPeople's Republic of China
| | - Zhenghan Yang
- Department of RadiologyBeijing Friendship Hospital, Capital Medical UniversityBeijingPeople's Republic of China
| | - Wenjuan Liu
- Department of RadiologyBeijing Friendship Hospital, Capital Medical UniversityBeijingPeople's Republic of China
- Department of Radiology, Aerospace Center HospitalBeijingPeople's Republic of China
| | - Zhenchang Wang
- Department of RadiologyBeijing Friendship Hospital, Capital Medical UniversityBeijingPeople's Republic of China
- School of Biological Science and Medical EngineeringBeihang UniversityBeijingPeople's Republic of China
| | - Han Lv
- Department of RadiologyBeijing Friendship Hospital, Capital Medical UniversityBeijingPeople's Republic of China
| |
Collapse
|
7
|
Matchaba S, Fellague-Chebra R, Purushottam P, Johns A. Early Diagnosis of Pancreatic Cancer via Machine Learning Analysis of a National Electronic Medical Record Database. JCO Clin Cancer Inform 2023; 7:e2300076. [PMID: 37816199 DOI: 10.1200/cci.23.00076] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2023] [Revised: 07/24/2023] [Accepted: 08/22/2023] [Indexed: 10/12/2023] Open
Abstract
PURPOSE Pancreatic cancer (PaC) is often diagnosed at advanced stages, resulting in one of the lowest survival rates among patients with cancer. The purpose of this study was to investigate whether machine learning (ML) models can predict with high sensitivity and specificity an increased risk for PaC ahead of clinical diagnosis. METHODS Optum deidentified electronic health record (EHR) data set was used to extract 1-year data for each patient and to sample for PaC diagnosis, the number of interactions with the health care system, and unique demographic and clinical features. Data for patients with PaC diagnosis were collected between 1 and 2 years before the diagnosis. Standard binary classification ML models were used on training and testing data sets. Data analyses were performed using the scikit-learn package version 1.0.1. RESULTS The data set consisted of 18,987 patient EHRs collected between December 31, 2007, and December 31, 2017. EHRs with 10 unique features and at least three health care interactions were used for model training (N = 15,189; n = 8,438 [56%] with PaC) and testing (N = 3,798; n = 2,127 [56%] with PaC). The ensemble model achieved an AUC of 0.89, a sensitivity of 85.61%, and a specificity of 76.18% on the testing data set and produced superior results compared with other binary classifiers. Increasing unique health care interactions to nine failed to improve the AUC score. When the testing data set was enlarged to 5,696 patients, the ensemble model achieved an AUC of 0.92 and a specificity of 93.21%, but the sensitivity was compromised. CONCLUSION The ensemble model exceeded the state-of-the-art level of performance for prediction of PaC ahead of clinical diagnosis with a minimal clinically guided input, providing a potential strategy for selection of high-risk patients for further screening.
Collapse
Affiliation(s)
- Siyabonga Matchaba
- Health Economics and Evidence Development, Novartis Oncology, East Hanover, NJ
- Mendel, San Jose, CA
| | | | | | - Adam Johns
- Health Economics and Evidence Development, Novartis Oncology, East Hanover, NJ
| |
Collapse
|
8
|
Bojesen AB, Mortensen FV, Kirkegård J. Real-Time Identification of Pancreatic Cancer Cases Using Artificial Intelligence Developed on Danish Nationwide Registry Data. JCO Clin Cancer Inform 2023; 7:e2300084. [PMID: 37812754 DOI: 10.1200/cci.23.00084] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2023] [Revised: 07/18/2023] [Accepted: 08/29/2023] [Indexed: 10/11/2023] Open
Abstract
PURPOSE Pancreatic cancer is expected to be the second leading cause of cancer-related deaths worldwide within few years. Most patients are not diagnosed in time for curative-intent treatment. Accelerating the time of diagnosis is a key component of reducing pancreatic cancer mortality. We developed and tested a dynamic algorithm aiming at proactively identifying patients with a substantially elevated risk of having undiagnosed pancreatic cancer. METHODS Machine learning methodology was applied to a live stream of nationwide Danish registry data. A hybrid case-control and prospective cohort design relying on incidence density sampling was used. Three models with minimal tuning were tested. All performance evaluation metrics were based on out-of-sample, out-of-time data in a monthly walk-forward strategy to avoid any temporal biases or inflation of performance metrics. Outcome was a diagnosis of pancreatic cancer. RESULTS Subgroups identified had a 10.1% risk of being diagnosed with pancreatic cancer within 1 year, corresponding to a number needed to screen of 9.9. When considering competing, potentially computed tomography-detectable GI cancers, this number is reduced to 5.7. The time of diagnosis can be accelerated by up to 142 days. CONCLUSION Currently available nationwide live data and computational resources are sufficient for real-time identification of individuals with at least 10.1% risk of having undiagnosed pancreatic cancer and 17.7% risk of any GI cancer in the Danish population. For prospective identification of high-risk patients, the area under the curve is not a useful indication of the positive predictive values achieved. Viable design solutions are demonstrated, which address the main shortfalls of the existing cancer prediction efforts in relation to temporal biases, leaks, and performance metric inflation. Efficacy evaluations with resection rates and mortality as end points are needed.
Collapse
Affiliation(s)
- Anders Bo Bojesen
- Department of Surgery, HPB Section, Aarhus University Hospital, Aarhus, Denmark
| | - Frank Viborg Mortensen
- Department of Surgery, HPB Section, Aarhus University Hospital, Aarhus, Denmark
- Department of Clinical Medicine, Aarhus University, Aarhus, Denmark
| | - Jakob Kirkegård
- Department of Surgery, HPB Section, Aarhus University Hospital, Aarhus, Denmark
- Department of Clinical Medicine, Aarhus University, Aarhus, Denmark
| |
Collapse
|
9
|
Khan S, Bhushan B. Machine Learning Predicts Patients With New-onset Diabetes at Risk of Pancreatic Cancer. J Clin Gastroenterol 2023:00004836-990000000-00190. [PMID: 37522752 DOI: 10.1097/mcg.0000000000001897] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/04/2023] [Accepted: 06/22/2023] [Indexed: 08/01/2023]
Abstract
BACKGROUND New-onset diabetes represent a high-risk cohort to screen for pancreatic cancer. GOALS Develop a machine model to predict pancreatic cancer among patients with new-onset diabetes. STUDY A retrospective cohort of patients with new-onset diabetes was assembled from multiple health care networks in the United States. An XGBoost machine learning model was designed from a portion of this cohort (the training set) and tested on the remaining part of the cohort (the test set). Shapley values were used to explain the XGBoost's model features. Model performance was compared with 2 contemporary models designed to predict pancreatic cancer among patients with new-onset diabetes. RESULTS In the test set, the XGBoost model had an area under the curve of 0.80 (0.76 to 0.85) compared with 0.63 and 0.68 for other models. Using cutoffs based on the Youden index, the sensitivity of the XGBoost model was 75%, the specificity was 70%, the accuracy was 70%, the positive predictive value was 1.2%, and the negative predictive value was >99%. The XGBoost model obtained a positive predictive value of at least 2.5% with a sensitivity of 38%. The XGBoost model was the only model that detected at least 50% of patients with cancer one year after the onset of diabetes. All 3 models had similar features that predicted pancreatic cancer, including older age, weight loss, and the rapid destabilization of glucose homeostasis. CONCLUSION Machine learning models isolate a high-risk cohort from those with new-onset diabetes at risk for pancreatic cancer.
Collapse
Affiliation(s)
- Salman Khan
- Department of Medicine, West Virginia University School of Medicine, West Virginia University, Morgantown, WV
- Northeast Ohio Medical University, Rootstown, OH
| | - Bharath Bhushan
- Department of Medicine, West Virginia University School of Medicine, West Virginia University, Morgantown, WV
| |
Collapse
|
10
|
Biziaev T, Aktary ML, Wang Q, Chekouo T, Bhatti P, Shack L, Robson PJ, Kopciuk KA. Development and External Validation of Partial Proportional Odds Risk Prediction Models for Cancer Stage at Diagnosis among Males and Females in Canada. Cancers (Basel) 2023; 15:3545. [PMID: 37509208 PMCID: PMC10377619 DOI: 10.3390/cancers15143545] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2023] [Revised: 07/04/2023] [Accepted: 07/05/2023] [Indexed: 07/30/2023] Open
Abstract
Risk prediction models for cancer stage at diagnosis may identify individuals at higher risk of late-stage cancer diagnoses. Partial proportional odds risk prediction models for cancer stage at diagnosis for males and females were developed using data from Alberta's Tomorrow Project (ATP). Prediction models were validated on the British Columbia Generations Project (BCGP) cohort using discrimination and calibration measures. Among ATP males, older age at diagnosis was associated with an earlier stage at diagnosis, while full- or part-time employment, prostate-specific antigen testing, and former/current smoking were associated with a later stage at diagnosis. Among ATP females, mammogram and sigmoidoscopy or colonoscopy were associated with an earlier stage at diagnosis, while older age at diagnosis, number of pregnancies, and hysterectomy were associated with a later stage at diagnosis. On external validation, discrimination results were poor for both males and females while calibration results indicated that the models did not over- or under-fit to derivation data or over- or under-predict risk. Multiple factors associated with cancer stage at diagnosis were identified among ATP participants. While the prediction model calibration was acceptable, discrimination was poor when applied to BCGP data. Updating our models with additional predictors may help improve predictive performance.
Collapse
Affiliation(s)
- Timofei Biziaev
- Department of Mathematics and Statistics, University of Calgary, Calgary, AB T2N 4N2, Canada
| | - Michelle L Aktary
- Faculty of Kinesiology, University of Calgary, Calgary, AB T2N 1N4, Canada
| | - Qinggang Wang
- Cancer Epidemiology and Prevention Research, Cancer Care Alberta, Alberta Health Services, Calgary, AB T2S 3C3, Canada
| | - Thierry Chekouo
- Department of Mathematics and Statistics, University of Calgary, Calgary, AB T2N 4N2, Canada
- Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455, USA
| | - Parveen Bhatti
- Cancer Control Research, BC Cancer, Vancouver, BC V5Z 1L3, Canada
- School of Population and Public Health, University of British Columbia, Vancouver, BC V6T 1Z3, Canada
| | - Lorraine Shack
- Cancer Surveillance and Reporting, Alberta Health Services, Calgary, AB T2S 3C3, Canada
| | - Paula J Robson
- Department of Agricultural, Food and Nutritional Science and School of Public Health, University of Alberta, Edmonton, AB T6G 2P5, Canada
- Cancer Care Alberta and Cancer Strategic Clinical Network, Alberta Health Services, Edmonton, AB T5J 3H1, Canada
| | - Karen A Kopciuk
- Department of Mathematics and Statistics, University of Calgary, Calgary, AB T2N 4N2, Canada
- Cancer Epidemiology and Prevention Research, Cancer Care Alberta, Alberta Health Services, Calgary, AB T2S 3C3, Canada
- Departments of Oncology, Community Health Sciences, University of Calgary, Calgary, AB T2N 4N2, Canada
| |
Collapse
|
11
|
Placido D, Yuan B, Hjaltelin JX, Zheng C, Haue AD, Chmura PJ, Yuan C, Kim J, Umeton R, Antell G, Chowdhury A, Franz A, Brais L, Andrews E, Marks DS, Regev A, Ayandeh S, Brophy MT, Do NV, Kraft P, Wolpin BM, Rosenthal MH, Fillmore NR, Brunak S, Sander C. A deep learning algorithm to predict risk of pancreatic cancer from disease trajectories. Nat Med 2023; 29:1113-1122. [PMID: 37156936 PMCID: PMC10202814 DOI: 10.1038/s41591-023-02332-5] [Citation(s) in RCA: 28] [Impact Index Per Article: 28.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2022] [Accepted: 03/31/2023] [Indexed: 05/10/2023]
Abstract
Pancreatic cancer is an aggressive disease that typically presents late with poor outcomes, indicating a pronounced need for early detection. In this study, we applied artificial intelligence methods to clinical data from 6 million patients (24,000 pancreatic cancer cases) in Denmark (Danish National Patient Registry (DNPR)) and from 3 million patients (3,900 cases) in the United States (US Veterans Affairs (US-VA)). We trained machine learning models on the sequence of disease codes in clinical histories and tested prediction of cancer occurrence within incremental time windows (CancerRiskNet). For cancer occurrence within 36 months, the performance of the best DNPR model has area under the receiver operating characteristic (AUROC) curve = 0.88 and decreases to AUROC (3m) = 0.83 when disease events within 3 months before cancer diagnosis are excluded from training, with an estimated relative risk of 59 for 1,000 highest-risk patients older than age 50 years. Cross-application of the Danish model to US-VA data had lower performance (AUROC = 0.71), and retraining was needed to improve performance (AUROC = 0.78, AUROC (3m) = 0.76). These results improve the ability to design realistic surveillance programs for patients at elevated risk, potentially benefiting lifespan and quality of life by early detection of this aggressive cancer.
Collapse
Affiliation(s)
- Davide Placido
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Bo Yuan
- Harvard Medical School, Boston, MA, USA
- Dana-Farber Cancer Institute, Boston, MA, USA
- Broad Institute of MIT and Harvard, Boston, MA, USA
| | - Jessica X Hjaltelin
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Chunlei Zheng
- VA Boston Healthcare System, Boston, MA, USA
- Boston University School of Medicine, Boston, MA, USA
| | - Amalie D Haue
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
- Copenhagen University Hospital, Rigshospitalet, Copenhagen, Denmark
| | - Piotr J Chmura
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Chen Yuan
- Harvard Medical School, Boston, MA, USA
- Dana-Farber Cancer Institute, Boston, MA, USA
| | - Jihye Kim
- Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Renato Umeton
- Dana-Farber Cancer Institute, Boston, MA, USA
- Harvard T.H. Chan School of Public Health, Boston, MA, USA
- Massachusetts Institute of Technology, Cambridge, MA, USA
- Weill Cornell Medicine, New York City, NY, USA
| | | | | | - Alexandra Franz
- Harvard Medical School, Boston, MA, USA
- Dana-Farber Cancer Institute, Boston, MA, USA
- Broad Institute of MIT and Harvard, Boston, MA, USA
| | | | | | | | - Aviv Regev
- Broad Institute of MIT and Harvard, Boston, MA, USA
- Genentech, Inc., South San Francisco, CA, USA
| | | | - Mary T Brophy
- VA Boston Healthcare System, Boston, MA, USA
- Boston University School of Medicine, Boston, MA, USA
| | - Nhan V Do
- VA Boston Healthcare System, Boston, MA, USA
- Boston University School of Medicine, Boston, MA, USA
| | - Peter Kraft
- Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Brian M Wolpin
- Harvard Medical School, Boston, MA, USA
- Dana-Farber Cancer Institute, Boston, MA, USA
- Brigham and Women's Hospital, Boston, MA, USA
| | - Michael H Rosenthal
- Harvard Medical School, Boston, MA, USA
- Dana-Farber Cancer Institute, Boston, MA, USA
- Brigham and Women's Hospital, Boston, MA, USA
| | - Nathanael R Fillmore
- Harvard Medical School, Boston, MA, USA
- Dana-Farber Cancer Institute, Boston, MA, USA
- VA Boston Healthcare System, Boston, MA, USA
- Boston University School of Medicine, Boston, MA, USA
| | - Søren Brunak
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark.
- Copenhagen University Hospital, Rigshospitalet, Copenhagen, Denmark.
| | - Chris Sander
- Harvard Medical School, Boston, MA, USA.
- Dana-Farber Cancer Institute, Boston, MA, USA.
- Broad Institute of MIT and Harvard, Boston, MA, USA.
| |
Collapse
|
12
|
Karar ME, El-Fishawy N, Radad M. Automated classification of urine biomarkers to diagnose pancreatic cancer using 1-D convolutional neural networks. J Biol Eng 2023; 17:28. [PMID: 37069681 PMCID: PMC10111836 DOI: 10.1186/s13036-023-00340-0] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2023] [Accepted: 03/13/2023] [Indexed: 04/19/2023] Open
Abstract
BACKGROUND Early diagnosis of Pancreatic Ductal Adenocarcinoma (PDAC) is the main key to surviving cancer patients. Urine proteomic biomarkers which are creatinine, LYVE1, REG1B, and TFF1 present a promising non-invasive and inexpensive diagnostic method of the PDAC. Recent utilization of both microfluidics technology and artificial intelligence techniques enables accurate detection and analysis of these biomarkers. This paper proposes a new deep-learning model to identify urine biomarkers for the automated diagnosis of pancreatic cancers. The proposed model is composed of one-dimensional convolutional neural networks (1D-CNNs) and long short-term memory (LSTM). It can categorize patients into healthy pancreas, benign hepatobiliary disease, and PDAC cases automatically. RESULTS Experiments and evaluations have been successfully done on a public dataset of 590 urine samples of three classes, which are 183 healthy pancreas samples, 208 benign hepatobiliary disease samples, and 199 PDAC samples. The results demonstrated that our proposed 1-D CNN + LSTM model achieved the best accuracy score of 97% and the area under curve (AUC) of 98% versus the state-of-the-art models to diagnose pancreatic cancers using urine biomarkers. CONCLUSION A new efficient 1D CNN-LSTM model has been successfully developed for early PDAC diagnosis using four proteomic urine biomarkers of creatinine, LYVE1, REG1B, and TFF1. This developed model showed superior performance on other machine learning classifiers in previous studies. The main prospect of this study is the laboratory realization of our proposed deep classifier on urinary biomarker panels for assisting diagnostic procedures of pancreatic cancer patients.
Collapse
Affiliation(s)
- Mohamed Esmail Karar
- Department of Industrial Electronics and Control Engineering, Faculty of Electronic Engineering, Menoufia University, Al Minufiyah, Egypt
| | - Nawal El-Fishawy
- Department of Computer Science and Engineering, Faculty of Electronic Engineering, Menoufia University, Al Minufiyah, Egypt
| | - Marwa Radad
- Department of Computer Science and Engineering, Faculty of Electronic Engineering, Menoufia University, Al Minufiyah, Egypt.
| |
Collapse
|
13
|
Jan Z, El Assadi F, Abd-alrazaq A, Jithesh PV. Artificial Intelligence for the Prediction and Early Diagnosis of Pancreatic Cancer: Scoping Review (Preprint).. [DOI: 10.2196/preprints.44248] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/01/2023]
Abstract
BACKGROUND
Pancreatic cancer is the 12th most common cancer worldwide, with an overall survival rate of 4.9%. Early diagnosis of pancreatic cancer is essential for timely treatment and survival. Artificial intelligence (AI) provides advanced models and algorithms for better diagnosis of pancreatic cancer.
OBJECTIVE
This study aims to explore AI models used for the prediction and early diagnosis of pancreatic cancers as reported in the literature.
METHODS
A scoping review was conducted and reported in line with the PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses Extension for Scoping Reviews) guidelines. PubMed, Google Scholar, Science Direct, BioRXiv, and MedRxiv were explored to identify relevant articles. Study selection and data extraction were independently conducted by 2 reviewers. Data extracted from the included studies were synthesized narratively.
RESULTS
Of the 1185 publications, 30 studies were included in the scoping review. The included articles reported the use of AI for 6 different purposes. Of these included articles, AI techniques were mostly used for the diagnosis of pancreatic cancer (14/30, 47%). Radiological images (14/30, 47%) were the most frequently used data in the included articles. Most of the included articles used data sets with a size of <1000 samples (11/30, 37%). Deep learning models were the most prominent branch of AI used for pancreatic cancer diagnosis in the studies, and the convolutional neural network was the most used algorithm (18/30, 60%). Six validation approaches were used in the included studies, of which the most frequently used approaches were k-fold cross-validation (10/30, 33%) and external validation (10/30, 33%). A higher level of accuracy (99%) was found in studies that used support vector machine, decision trees, and k-means clustering algorithms.
CONCLUSIONS
This review presents an overview of studies based on AI models and algorithms used to predict and diagnose pancreatic cancer patients. AI is expected to play a vital role in advancing pancreatic cancer prediction and diagnosis. Further research is required to provide data that support clinical decisions in health care.
Collapse
|
14
|
Lin KW, Ang TL, Li JW. Role of artificial intelligence in early detection and screening for pancreatic adenocarcinoma. Artif Intell Med Imaging 2022; 3:21-32. [DOI: 10.35711/aimi.v3.i2.21] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/16/2021] [Revised: 02/12/2022] [Accepted: 03/17/2022] [Indexed: 02/06/2023] Open
Abstract
Pancreatic adenocarcinoma remains to be one of the deadliest malignancies in the world despite treatment advancement over the past few decades. Its low survival rates and poor prognosis can be attributed to ambiguity in recommendations for screening and late symptom onset, contributing to its late presentation. In the recent years, artificial intelligence (AI) as emerged as a field to aid in the process of clinical decision making. Considerable efforts have been made in the realm of AI to screen for and predict future development of pancreatic ductal adenocarcinoma. This review discusses the use of AI in early detection and screening for pancreatic adenocarcinoma, and factors which may limit its use in a clinical setting.
Collapse
Affiliation(s)
- Kenneth Weicong Lin
- Department of Gastroenterology and Hepatology, Changi General Hospital, Singapore 529889, Singapore
| | - Tiing Leong Ang
- Department of Gastroenterology and Hepatology, Changi General Hospital, Singapore 529889, Singapore
| | - James Weiquan Li
- Department of Gastroenterology and Hepatology, Changi General Hospital, Singapore 529889, Singapore
| |
Collapse
|
15
|
Stott MC, Oldfield L, Hale J, Costello E, Halloran CM. Recent advances in understanding pancreatic cancer. Fac Rev 2022; 11:9. [PMID: 35509672 PMCID: PMC9022729 DOI: 10.12703/r/11-9] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
Pancreatic ductal adenocarcinoma (PDAC) is an intractable cancer and a leading cause of cancer deaths worldwide. Over 90% of patients die within 1 year of diagnosis. Deaths from PDAC are increasing and it remains a cancer of substantial unmet need. A number of factors contribute to its poor prognosis: namely, late presentation, early metastases and limited systemic therapy options because of chemoresistance. A variety of research approaches underway are aimed at improving patient survival. Here, we review high-risk groups and efforts for early detection. We examine recent developments in the understanding of complex molecular and metabolic alterations which accompany PDAC. We explore artificial intelligence and biological targets for therapy and examine the role of tumour stroma and the immune microenvironment. We also review recent developments with respect to the PDAC microbiome. It is hoped that current research efforts will translate into earlier diagnosis, improvements in treatment and better outcomes for patients.
Collapse
Affiliation(s)
- Martyn C Stott
- Department of Molecular & Clinical Cancer Medicine, Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Sherrington Building, Liverpool, UK
| | - Lucy Oldfield
- Department of Molecular & Clinical Cancer Medicine, Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Sherrington Building, Liverpool, UK
| | - Jessica Hale
- Department of Molecular & Clinical Cancer Medicine, Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Sherrington Building, Liverpool, UK
| | - Eithne Costello
- Department of Molecular & Clinical Cancer Medicine, Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Sherrington Building, Liverpool, UK
| | - Christopher M Halloran
- Department of Molecular & Clinical Cancer Medicine, Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Sherrington Building, Liverpool, UK
| |
Collapse
|