1
|
Cui F, Dang X, Peng D, She Y, Wang Y, Yang R, Han Z, Liu Y, Yang H. Association of sarcopenia with all-cause and cause-specific mortality in cancer patients: development and validation of a 3-year and 5-year survival prediction model. BMC Cancer 2025; 25:919. [PMID: 40405088 PMCID: PMC12100792 DOI: 10.1186/s12885-025-14303-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2024] [Accepted: 05/09/2025] [Indexed: 05/24/2025] Open
Abstract
BACKGROUND Sarcopenia is a clinicopathological condition characterized by a decrease in muscle strength and muscle mass, playing a crucial role in the prognosis of cancer. Therefore, this study aims to investigate the association between sarcopenia and both all-cause mortality and cancer-specific mortality among cancer patients. Furthermore, we plan to develop risk prediction models using machine learning algorithms to predict 3-year and 5-year survival rates in cancer patients. METHOD This study included 1095 cancer patients from the National Health and Nutrition Examination Survey (NHANES) cohorts spanning 1999-2006 and 2011-2014. Initially, we used the Least Absolute Shrinkage and Selection Operator (LASSO)-Cox regression models for feature selection. Subsequently, we employed multivariable Cox regression models to investigate the association between sarcopenia and all-cause and cancer-specific mortality in cancer patients. We developed five machine learning algorithms, including Support Vector Machine (SVM), Logistic Regression (LR), Random Forest (RF), LightGBM, and XGBoost, to predict 3-year and 5-year survival rates and to perform risk stratification. RESULTS The multivariable COX regression model showed sarcopenia significantly increases the risk of all-cause mortality (HR = 1.33, 95%CI:1.05, 1.70, P = 0.0194) and cancer-specific mortality (HR = 1.67, 95%CI:1.09, 2.55, P = 0.0176) in cancer patients. Among the five machine learning algorithms developed, the LightGBM model demonstrated strong performance in the 3-year and 5-year survival prediction tasks, making it the optimal model selection. Decision curve analysis and Kaplan-Meier curves further confirmed our model's ability to identify high-risk individuals effectively. CONCLUSIONS Sarcopenia significantly increases the risk of mortality in cancer patients. We developed a survival prediction model for cancer patients that effectively identifies high-risk individuals, thereby providing a foundation for personalized survival assessment.
Collapse
Affiliation(s)
- Feng Cui
- Department of General Surgery, Lanzhou University Second Hospital, Cui Ying Men No.80Gansu Province, Lanzhou, 730030, People's Republic of China
| | - Xiangji Dang
- Department of Pharmaceutical, Lanzhou University Second Hospital, Cui Ying Men No.80, Lanzhou, 730030, Gansu Province, People's Republic of China
| | - Daiyun Peng
- Department of Nuclear Medicine, Lanzhou University Second Hospital, Cui Ying Men No.80, Lanzhou, 730030, Gansu Province, People's Republic of China
| | - Yuanhua She
- Department of General Surgery, Lanzhou University Second Hospital, Cui Ying Men No.80Gansu Province, Lanzhou, 730030, People's Republic of China
| | - Yubin Wang
- Lanzhou University Second Hospital, Cui Ying Men No.80, Lanzhou, 730030, Gansu Province, People's Republic of China
| | - Ruifeng Yang
- School of Second Clinical Medical, Lanzhou University, Donggang West Road No. 199, Lanzhou, 730030, Gansu Province, People's Republic of China
| | - Zhiyao Han
- School of Second Clinical Medical, Lanzhou University, Donggang West Road No. 199, Lanzhou, 730030, Gansu Province, People's Republic of China
| | - Yan Liu
- Gansu High Throughput Screening and Creation Center for Health Products, School of Pharmacy, Lanzhou University, Donggang West Road No. 199, Lanzhou, 730020, People's Republic of China.
| | - Hanteng Yang
- Department of General Surgery, Lanzhou University Second Hospital, Cui Ying Men No.80Gansu Province, Lanzhou, 730030, People's Republic of China.
| |
Collapse
|
2
|
Negoi I. Personalized surveillance in colorectal cancer: Integrating circulating tumor DNA and artificial intelligence into post-treatment follow-up. World J Gastroenterol 2025; 31:106670. [DOI: 10.3748/wjg.v31.i18.106670] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/03/2025] [Revised: 04/07/2025] [Accepted: 04/18/2025] [Indexed: 05/13/2025] Open
Abstract
Given the growing burden of colorectal cancer (CRC) as a global health challenge, it becomes imperative to focus on strategies that can mitigate its impact. Post-treatment surveillance has emerged as essential for early detection of recurrence, significantly improving patient outcomes. However, intensive surveillance strategies have shown mixed results compared to less intensive methods, emphasizing the necessity for personalized, risk-adapted approaches. The observed suboptimal adherence to existing surveillance protocols underscores the urgent need for more tailored and efficient strategies. In this context, circulating tumor DNA (ctDNA) emerges as a promising biomarker with significant potential to revolutionize post-treatment surveillance, demonstrating high specificity [0.95, 95% confidence interval (CI): 0.91-0.97] and robust diagnostic odds (37.6, 95%CI: 20.8-68.0) for recurrence detection. Furthermore, artificial intelligence and machine learning models integrating patient-specific and tumor features can enhance risk stratification and optimize surveillance strategies. The reported area under the receiver operating characteristic curve, measuring artificial intelligence model performance in predicting CRC recurrence, ranged from 0.581 and 0.593 at the lowest to 0.979 and 0.978 at the highest in training and validation cohorts, respectively. Despite this promise, addressing cost, accessibility, and extensive validation remains crucial for equitable integration into clinical practice.
Collapse
Affiliation(s)
- Ionut Negoi
- Department of General Surgery, Carol Davila University of Medicine and Pharmacy Bucharest, Clinical Emergency Hospital of Bucharest, Bucharest 014461, Romania
| |
Collapse
|
3
|
Santos CS, Amorim-Lopes M. Externally validated and clinically useful machine learning algorithms to support patient-related decision-making in oncology: a scoping review. BMC Med Res Methodol 2025; 25:45. [PMID: 39984835 PMCID: PMC11843972 DOI: 10.1186/s12874-025-02463-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2023] [Accepted: 01/03/2025] [Indexed: 02/23/2025] Open
Abstract
BACKGROUND This scoping review systematically maps externally validated machine learning (ML)-based models in cancer patient care, quantifying their performance, and clinical utility, and examining relationships between models, cancer types, and clinical decisions. By synthesizing evidence, this study identifies, strengths, limitations, and areas requiring further research. METHODS The review followed the Joanna Briggs Institute's methodology, Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension for Scoping Reviews guidelines, and the Population, Concept, and Context mnemonic. Searches were conducted across Embase, IEEE Xplore, PubMed, Scopus, and Web of Science (January 2014-September 2022), targeting English-language quantitative studies in Q1 journals (SciMago Journal and Country Ranking > 1) that used ML to evaluate clinical outcomes for human cancer patients with commonly available data. Eligible models required external validation, clinical utility assessment, and performance metric reporting. Studies involving genetics, synthetic patients, plants, or animals were excluded. Results were presented in tabular, graphical, and descriptive form. RESULTS From 4023 deduplicated abstracts and 636 full-text reviews, 56 studies (2018-2022) met the inclusion criteria, covering diverse cancer types and applications. Convolutional neural networks were most prevalent, demonstrating high performance, followed by gradient- and decision tree-based algorithms. Other algorithms, though underrepresented, showed promise. Lung and digestive system cancers were most frequently studied, focusing on diagnosis and outcome predictions. Most studies were retrospective and multi-institutional, primarily using image-based data, followed by text-based and hybrid approaches. Clinical utility assessments involved 499 clinicians and 12 tools, indicating improved clinician performance with AI assistance and superior performance to standard clinical systems. DISCUSSION Interest in ML-based clinical decision-making has grown in recent years alongside increased multi-institutional collaboration. However, small sample sizes likely impacted data quality and generalizability. Persistent challenges include limited international validation across ethnicities, inconsistent data sharing, disparities in validation metrics, and insufficient calibration reporting, hindering model comparison reliability. CONCLUSION Successful integration of ML in oncology decision-making requires standardized data and methodologies, larger sample sizes, greater transparency, and robust validation and clinical utility assessments. OTHER Financed by FCT-Fundação para a Ciência e a Tecnologia (Portugal, project LA/P/0063/2020, grant 2021.09040.BD) as part of CSS's Ph.D. This work was not registered.
Collapse
Affiliation(s)
- Catarina Sousa Santos
- Institute for Systems and Computer Engineering, Technology and Science (INESC TEC), Porto, Portugal.
| | - Mário Amorim-Lopes
- Institute for Systems and Computer Engineering, Technology and Science (INESC TEC), Porto, Portugal
| |
Collapse
|
4
|
Ayubi E, Farashi S, Tapak L, Afshar S. Development and validation of a biomarker-based prediction model for metastasis in patients with colorectal cancer: Application of machine learning algorithms. Heliyon 2025; 11:e41443. [PMID: 39839508 PMCID: PMC11748706 DOI: 10.1016/j.heliyon.2024.e41443] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2024] [Revised: 12/21/2024] [Accepted: 12/22/2024] [Indexed: 01/23/2025] Open
Abstract
Objective The purpose of the current study was to develop and validate a biomarker-based prediction model for metastasis in patients with colorectal cancer (CRC). Methods Two datasets, GSE68468 and GSE41568, were retrieved from the Gene Expression Omnibus (GEO) database. In the GSE68468 dataset, key biomarkers were identified through a screening process involving differential expression analysis, redundancy analysis, and recursive feature elimination technique. Subsequently, the prediction model was developed and internally validated using five machine learning (ML) algorithms including lasso and elastic-net regularized generalized linear model (glmnet), k-nearest neighbors (kNN), support vector machine (SVM) with Radial Basis Function Kernel, random forest (RF), and eXtreme Gradient Boosting (XGBoost). The predictive performance of the algorithm with the highest accuracy was then externally validated on the GSE41568 dataset. Results Among 22,283 registered genes in the GSE68468 dataset, the screening process identified 16 key genes including MMP3, CCDC102B, CDH2, SCGB1A1, KRT7, CYP1B1, LAMC3, ALB, DIXDC1, VWF, MMP1, CYP4B1, NKX3-2, TMEM158, GADD45B, SERPINA1 and these genes were used to build the prediction model. On the internal validation dataset, the prediction performance of five ML algorithms was as follows; RF (accuracy = 0.97 and kappa = 0.91), XGBoost (0.93, 0.81), kNN (0.93, 0.81), glmnet (0.93, 0.82) and SVM (0.92, 0.80). Top five biomarkers were MMP3, CCDC102B, CDH2, VWF and MMP1. The RF model exhibited an accuracy of 0.97, a kappa value of 0.92, and an area under the curve (AUC) of 0.99 in the external validation dataset. Conclusion The results of this study have identified biomarkers through ML algorithms which help to identify patients with CRC prone to metastasis.
Collapse
Affiliation(s)
- Erfan Ayubi
- Cancer Research Center, Institute of Cancer, Avicenna Health Research Institute, Hamadan University of Medical Sciences, Hamadan, Iran
| | - Sajjad Farashi
- Neurophysiology Research Center, Institute of Neuroscience and Mental Health, Avicenna Health Research Institute, Hamadan University of Medical Sciences, Hamadan, Iran
| | - Leili Tapak
- Modeling of Noncommunicable Diseases Research Center, Institute of Health Sciences andTechnologies, Avicenna Health Research Institute, Hamadan University of Medical Sciences, Hamadan, Iran
- Department of Biostatistics, School of Public Health, Hamadan University of Medical Sciences, Hamadan, Iran
| | - Saeid Afshar
- Cancer Research Center, Institute of Cancer, Avicenna Health Research Institute, Hamadan University of Medical Sciences, Hamadan, Iran
- Department of Medical Biotechnology, School of Advanced Medical Sciences and Technologies, Hamadan University of Medical Sciences, Hamadan, Iran
| |
Collapse
|
5
|
Jafarkhani A, Imani B, Saeedi S, Shams A. Predicting Factors Affecting Survival Rate in Patients Undergoing On-Pump Coronary Artery Bypass Graft Surgery Using Machine Learning Methods: A Systematic Review. Health Sci Rep 2025; 8:e70336. [PMID: 39846048 PMCID: PMC11751876 DOI: 10.1002/hsr2.70336] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2024] [Revised: 12/03/2024] [Accepted: 12/25/2024] [Indexed: 01/24/2025] Open
Abstract
Background and Aim Coronary artery bypass grafting (CABG) is a key treatment for coronary artery disease, but accurately predicting patient survival after the procedure presents significant challenges. This study aimed to systematically review articles using machine learning techniques to predict patient survival rates and identify factors affecting these rates after CABG surgery. Methods From January 1, 2015, to January 20, 2024, a comprehensive literature search was conducted across PubMed, Scopus, IEEE Xplore, and Web of Science. The review adhered to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines. Inclusion criteria included studies that evaluated survival rates and predictors associated with CABG patients during the specified period. Results After eliminating duplicates, a total of 1330 articles were identified. Following a systematic screening, 24 studies met the inclusion criteria. Our findings revealed 43 distinct factors influencing survival rates in patients undergoing CABG. Notably, five factors-age, ejection fraction, diabetes mellitus, a history of cerebrovascular disease or accidents, and renal function-were consistently identified across multiple studies as significant predictors of postsurgical survival. Conclusion This systematic review identifies key factors influencing survival rates after CABG surgery and highlights the role of machine learning in improving predictive accuracy. By identifying high-risk patients through these key factors, our findings offer practical insights for healthcare providers, enhancing patient management and customizing therapeutic strategies after CABG. This study significantly enhances existing literature by combining machine learning techniques with clinical factors, thereby improving the understanding of patient outcomes in CABG surgery.
Collapse
Affiliation(s)
- Alireza Jafarkhani
- Department of Operating Room, School of ParamedicineHamadan University of Medical SciencesHamadanIran
| | - Behzad Imani
- Department of Operating Room, School of ParamedicineHamadan University of Medical SciencesHamadanIran
| | - Soheila Saeedi
- Department of Health Information Technology, School of Allied Medical SciencesHamadan University of Medical SciencesHamadanIran
| | - Amir Shams
- Department of Cardiac Surgery, School of MedicineHamadan University of Medical SciencesHamadanIran
| |
Collapse
|
6
|
Yadalam PK, Thirukkumaran PV, Natarajan PM, Ardila CM. Light gradient boost tree classifier predictions on appendicitis with periodontal disease from biochemical and clinical parameters. FRONTIERS IN ORAL HEALTH 2024; 5:1462873. [PMID: 39346113 PMCID: PMC11427431 DOI: 10.3389/froh.2024.1462873] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2024] [Accepted: 08/29/2024] [Indexed: 10/01/2024] Open
Abstract
INTRODUCTION Untreated periodontitis significantly increases the risk of tooth loss, often delaying treatment due to asymptomatic phases. Recent studies have increasingly associated poor dental health with conditions such as rheumatoid arthritis, diabetes, obesity, pneumonia, cardiovascular disease, and renal illness. Despite these connections, limited research has investigated the relationship between appendicitis and periodontal disease. This study aims to predict appendicitis in patients with periodontal disease using biochemical and clinical parameters through the application of a light gradient boost tree classifier. METHODS Data from 125 patient records at Saveetha Institute of Dental College and Medical College were pre-processed and analyzed. We utilized data preprocessing techniques, feature selection methods, and model development approaches to estimate the risk of appendicitis in patients with periodontitis. Both Random Forest and Light Gradient Boosting algorithms were evaluated for accuracy using confusion matrices to assess their predictive performance. RESULTS The Random Forest model achieved an accuracy of 94%, demonstrating robust predictive capability in this context. In contrast, the Light Gradient Boost algorithms achieved a significantly higher accuracy of 98%, underscoring their superior predictive efficiency. This substantial difference highlights the importance of algorithm selection and optimization in developing reliable predictive models. The higher accuracy of Light Gradient Boost algorithms suggests effective minimization of prediction errors and improved differentiation between appendicitis with periodontitis and healthy states. Our study identifies age, white blood cell count, and symptom duration as pivotal predictors for detecting concurrent periodontitis in acute appendicitis cases. CONCLUSIONS The newly developed prediction model introduces a novel and promising approach, providing valuable insights into distinguishing between periodontitis and acute appendicitis. These findings highlight the potential to improve diagnostic accuracy and support informed clinical decision-making in patients presenting with both conditions, offering new avenues for optimizing patient care strategies.
Collapse
Affiliation(s)
- Pradeep Kumar Yadalam
- Department of Periodontics, Saveetha Dental College, SIMATS, Saveetha University, Chennai, India
- Saveetha Institute of Medical and Technical Science [SIMATS], Saveetha University, Chennai, India
| | | | - Prabhu Manickam Natarajan
- Department of Clinical Sciences, Center of Medical and Bio-Allied Health Sciences and Research, College of Dentistry, Ajman University, Ajman, United Arab Emirates
| | - Carlos M. Ardila
- Department of Basic Sciences, Universidad de Antioquia U de A, Medellín, Colombia
- Biomedical Stomatology Research Group, Universidad de Antioquia U de A, Medellín, Colombia
| |
Collapse
|
7
|
Wang Y, Liu K, He W, Dan J, Zhu M, Chen L, Zhou W, Li M, Li J. Precision prognosis of colorectal cancer: a multi-tiered model integrating microsatellite instability genes and clinical parameters. Front Oncol 2024; 14:1396726. [PMID: 39055563 PMCID: PMC11269184 DOI: 10.3389/fonc.2024.1396726] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2024] [Accepted: 06/24/2024] [Indexed: 07/27/2024] Open
Abstract
Background Prognostic assessment for colorectal cancer (CRC) displays substantial heterogeneity, as reliance solely on traditional TNM staging falls short of achieving precise individualized predictions. The integration of diverse biological information sources holds the potential to enhance prognostic accuracy. Objective To establish a comprehensive multi-tiered precision prognostic evaluation system for CRC by amalgamating gene expression profiles, clinical characteristics, and tumor microsatellite instability (MSI) status in CRC patients. Methods We integrated genomic data, clinical information, and survival follow-up data from 483 CRC patients obtained from The Cancer Genome Atlas (TCGA) and Gene Expression Omnibus (GEO) databases. MSI-related gene modules were identified using differential expression analysis and Weighted Gene Co-expression Network Analysis (WGCNA). Three prognostic models were constructed: MSI-Related Gene Prognostic Model (Model I), Clinical Prognostic Model (Model II), and Integrated Multi-Layered Prognostic Model (Model III) by combining clinical features. Model performance was assessed and compared using Receiver Operating Characteristic (ROC) curves, Kaplan-Meier analysis, and other methods. Results Six MSI-related genes were selected for constructing Model I (AUC = 0.724); Model II used two clinical features (AUC = 0.684). Compared to individual models, the integrated Model III exhibited superior performance (AUC = 0.825) and demonstrated good stability in an independent dataset (AUC = 0.767). Conclusion This study successfully developed and validated a comprehensive multi-tiered precision prognostic assessment model for CRC, providing an effective tool for personalized medical management of CRC.
Collapse
Affiliation(s)
- Yonghong Wang
- Department of Gastrointestinal Surgery, The People's Hospital of Leshan, Leshan, China
| | | | | | | | | | | | | | | | | |
Collapse
|
8
|
Yang P, Qiu H, Yang X, Wang L, Wang X. SAGL: A self-attention-based graph learning framework for predicting survival of colorectal cancer patients. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2024; 249:108159. [PMID: 38583291 DOI: 10.1016/j.cmpb.2024.108159] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/25/2023] [Revised: 02/28/2024] [Accepted: 03/29/2024] [Indexed: 04/09/2024]
Abstract
BACKGROUND AND OBJECTIVE Colorectal cancer (CRC) is one of the most commonly diagnosed cancers worldwide. The accurate survival prediction for CRC patients plays a significant role in the formulation of treatment strategies. Recently, machine learning and deep learning approaches have been increasingly applied in cancer survival prediction. However, most existing methods inadequately represent and leverage the dependencies among features and fail to sufficiently mine and utilize the comorbidity patterns of CRC. To address these issues, we propose a self-attention-based graph learning (SAGL) framework to improve the postoperative cancer-specific survival prediction for CRC patients. METHODS We present a novel method for constructing dependency graph (DG) to reflect two types of dependencies including comorbidity-comorbidity dependencies and the dependencies between features related to patient characteristics and cancer treatments. This graph is subsequently refined by a disease comorbidity network, which offers a holistic view of comorbidity patterns of CRC. A DG-guided self-attention mechanism is proposed to unearth novel dependencies beyond what DG offers, thus augmenting CRC survival prediction. Finally, each patient will be represented, and these representations will be used for survival prediction. RESULTS The experimental results show that SAGL outperforms state-of-the-art methods on a real-world dataset, with the receiver operating characteristic curve for 3- and 5-year survival prediction achieving 0.849±0.002 and 0.895±0.005, respectively. In addition, the comparison results with different graph neural network-based variants demonstrate the advantages of our DG-guided self-attention graph learning framework. CONCLUSIONS Our study reveals that the potential of the DG-guided self-attention in optimizing feature graph learning which can improve the performance of CRC survival prediction.
Collapse
Affiliation(s)
- Ping Yang
- School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu, 611731, PR China
| | - Hang Qiu
- School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu, 611731, PR China; Big Data Research Center, University of Electronic Science and Technology of China, Chengdu, 611731, PR China.
| | - Xulin Yang
- School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu, 611731, PR China
| | - Liya Wang
- Big Data Research Center, University of Electronic Science and Technology of China, Chengdu, 611731, PR China
| | - Xiaodong Wang
- Department of Gastrointestinal Surgery, West China Hospital, Sichuan University, Chengdu, 610041, PR China.
| |
Collapse
|
9
|
Kayikcioglu E, Onder AH, Bacak B, Serel TA. Machine learning for predicting colon cancer recurrence. Surg Oncol 2024; 54:102079. [PMID: 38688191 DOI: 10.1016/j.suronc.2024.102079] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2024] [Revised: 03/09/2024] [Accepted: 04/15/2024] [Indexed: 05/02/2024]
Abstract
INTRODUCTION Colorectal cancer (CRC) is a global public health concern, ranking among the most commonly diagnosed malignancies worldwide. Despite advancements in treatment modalities, the specter of CRC recurrence remains a significant challenge, demanding innovative solutions for early detection and intervention. The integration of machine learning into oncology offers a promising avenue to address this issue, providing data-driven insights and personalized care. METHODS This retrospective study analyzed data from 396 patients who underwent surgical procedures for colon cancer (CC) between 2010 and 2021. Machine learning algorithms were employed to predict CC recurrence, with a focus on demographic, clinicopathological, and laboratory characteristics. A range of evaluation metrics, including AUC (Area Under the Receiver Operating Characteristic), accuracy, recall, precision, and F1 scores, assessed the performance of machine learning algorithms. RESULTS Significant risk factors for CC recurrence were identified, including sex, carcinoembryonic antigen (CEA) levels, tumor location, depth, lymphatic and venous invasion, and lymph node involvement. The CatBoost Classifier demonstrated exceptional performance, achieving an AUC of 0.92 and an accuracy of 88 % on the test dataset. Feature importance analysis highlighted the significance of CEA levels, albumin levels, N stage, weight, platelet count, height, neutrophil count, lymphocyte count, and gender in determining recurrence risk. DISCUSSION The integration of machine learning into healthcare, exemplified by this study's findings, offers a pathway to personalized patient risk stratification and enhanced clinical decision-making. Early identification of individuals at risk of CC recurrence holds the potential for more effective therapeutic interventions and improved patient outcomes. CONCLUSION Machine learning has the potential to revolutionize our approach to CC recurrence prediction, emphasizing the synergy between medical expertise and cutting-edge technology in the fight against cancer. This study represents a vital step toward precision medicine in CC management, showcasing the transformative power of data-driven insights in oncology.
Collapse
Affiliation(s)
- Erkan Kayikcioglu
- Department of Medical Oncology, Suleyman Demirel University, Isparta, Turkey.
| | - Arif Hakan Onder
- Department of Medical Oncology, Health Sciences University Antalya Research and Training Hospital, Antalya, Turkey
| | - Burcu Bacak
- Department of Medical Oncology, Suleyman Demirel University, Isparta, Turkey
| | | |
Collapse
|
10
|
Lu C, Liu L, Yin M, Lin J, Zhu S, Gao J, Qu S, Xu G, Liu L, Zhu J, Xu C. The development and validation of automated machine learning models for predicting lymph node metastasis in Siewert type II T1 adenocarcinoma of the esophagogastric junction. Front Med (Lausanne) 2024; 11:1266278. [PMID: 38633305 PMCID: PMC11021582 DOI: 10.3389/fmed.2024.1266278] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2023] [Accepted: 03/15/2024] [Indexed: 04/19/2024] Open
Abstract
Background Lymph node metastasis (LNM) is considered an essential prognosis factor for adenocarcinoma of the esophagogastric junction (AEG), which also affects the treatment strategies of AEG. We aimed to evaluate automated machine learning (AutoML) algorithms for predicting LNM in Siewert type II T1 AEG. Methods A total of 878 patients with Siewert type II T1 AEG were selected from the Surveillance, Epidemiology, and End Results (SEER) database to develop the LNM predictive models. The patients from two hospitals in Suzhou were collected as the test set. We applied five machine learning algorithms to develop the LNM prediction models. The performance of predictive models was assessed using various metrics including accuracy, sensitivity, specificity, the area under the curve (AUC), and receiver operating characteristic (ROC) curve. Results Patients with LNM exhibited a higher proportion of male individuals, a poor degree of differentiation, and submucosal infiltration, with statistical differences. The deep learning (DL) model demonstrated relatively good accuracy (0.713) and sensitivity (0.868) among the five models. Moreover, the DL model achieved the highest AUC (0.781) and sensitivity (1.000) in the test set. Conclusion The DL model showed good predictive performance among five AutoML models, indicating the advantage of AutoML in modeling LNM prediction in patients with Siewert type II T1 AEG.
Collapse
Affiliation(s)
- Chenghao Lu
- Department of Gastroenterology, The First Affiliated Hospital of Soochow University, Suzhou, Jiangsu, China
- Suzhou Clinical Center of Digestive Diseases, Suzhou, Jiangsu, China
| | - Lu Liu
- Department of Gastroenterology, The First Affiliated Hospital of Soochow University, Suzhou, Jiangsu, China
- Suzhou Clinical Center of Digestive Diseases, Suzhou, Jiangsu, China
| | - Minyue Yin
- Department of Gastroenterology, The First Affiliated Hospital of Soochow University, Suzhou, Jiangsu, China
- Suzhou Clinical Center of Digestive Diseases, Suzhou, Jiangsu, China
- Department of Gastroenterology, Beijing Friendship Hospital, Capital Medical University, National Clinical Research Center for Digestive Disease, Beijing Digestive Disease Center, State Key Laboratory of Digestive Health, Beijing, China
| | - Jiaxi Lin
- Department of Gastroenterology, The First Affiliated Hospital of Soochow University, Suzhou, Jiangsu, China
- Suzhou Clinical Center of Digestive Diseases, Suzhou, Jiangsu, China
| | - Shiqi Zhu
- Department of Gastroenterology, The First Affiliated Hospital of Soochow University, Suzhou, Jiangsu, China
- Suzhou Clinical Center of Digestive Diseases, Suzhou, Jiangsu, China
| | - Jingwen Gao
- Department of Gastroenterology, The First Affiliated Hospital of Soochow University, Suzhou, Jiangsu, China
- Suzhou Clinical Center of Digestive Diseases, Suzhou, Jiangsu, China
| | - Shuting Qu
- Department of Gastroenterology, The First Affiliated Hospital of Soochow University, Suzhou, Jiangsu, China
- Suzhou Clinical Center of Digestive Diseases, Suzhou, Jiangsu, China
| | - Guoting Xu
- Department of Gastroenterology, The First Affiliated Hospital of Soochow University, Suzhou, Jiangsu, China
- Suzhou Clinical Center of Digestive Diseases, Suzhou, Jiangsu, China
| | - Lihe Liu
- Department of Gastroenterology, The First Affiliated Hospital of Soochow University, Suzhou, Jiangsu, China
- Suzhou Clinical Center of Digestive Diseases, Suzhou, Jiangsu, China
| | - Jinzhou Zhu
- Department of Gastroenterology, The First Affiliated Hospital of Soochow University, Suzhou, Jiangsu, China
- Suzhou Clinical Center of Digestive Diseases, Suzhou, Jiangsu, China
| | - Chunfang Xu
- Department of Gastroenterology, The First Affiliated Hospital of Soochow University, Suzhou, Jiangsu, China
- Suzhou Clinical Center of Digestive Diseases, Suzhou, Jiangsu, China
- The Forth Affiliated Hospital of Soochow University, Suzhou, China
| |
Collapse
|
11
|
Li S, Yi H, Leng Q, Wu Y, Mao Y. New perspectives on cancer clinical research in the era of big data and machine learning. Surg Oncol 2024; 52:102009. [PMID: 38215544 DOI: 10.1016/j.suronc.2023.102009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2023] [Accepted: 10/16/2023] [Indexed: 01/14/2024]
Abstract
In the 21st century, the development of medical science has entered the era of big data, and machine learning has become an essential tool for mining medical big data. The establishment of the SEER database has provided a wealth of epidemiological data for cancer clinical research, and the number of studies based on SEER and machine learning has been growing in recent years. This article reviews recent research based on SEER and machine learning and finds that the current focus of such studies is primarily on the development and validation of models using machine learning algorithms, with the main directions being lymph node metastasis prediction, distant metastasis prediction, and prognosis-related research. Compared to traditional models, machine learning algorithms have the advantage of stronger adaptability, but also suffer from disadvantages such as overfitting and poor interpretability, which need to be weighed in practical applications. At present, machine learning algorithms, as the foundation of artificial intelligence, have just begun to emerge in the field of cancer clinical research. The future development of oncology will enter a more precise era of cancer research, characterized by larger data, higher dimensions, and more frequent information exchange. Machine learning is bound to shine brightly in this field.
Collapse
Affiliation(s)
- Shujun Li
- Department of Hematology, Xiangya Hospital, Central South University, Changsha, 410008, China; National Clinical Research Center for Geriatric Diseases (Xiangya Hospital), China; Hunan Hematology Oncology Clinical Medical Research Center, China
| | - Hang Yi
- Department of Thoracic Surgery, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, 100021, China
| | - Qihao Leng
- Xiangya School of Medicine, Central South University, Changsha, 410013, Hunan Province, China
| | - You Wu
- Institute for Hospital Management, School of Medicine, Tsinghua University, 30 Shuangqing Rd, Haidian District, Beijing, China; Department of Health Policy and Management, Bloomberg School of Public Health, Johns Hopkins University, Baltimore, MD, 21205, USA.
| | - Yousheng Mao
- Department of Thoracic Surgery, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, 100021, China.
| |
Collapse
|
12
|
Mohammadi G, Azizmohammad Looha M, Pourhoseingholi MA, Rezaei Tavirani M, Sohrabi S, Zareie Shab Khaneh A, Piri H, Alaei M, Parvani N, Vakilzadeh I, javadi S, Moradian Haft Cheshmeh Z, Razzaghi Z, Mahmoud Robati R, Zamanian Azodi M, Zarean Shahraki S, Talebi R, Charati Yazdani J, Motlagh ME, Khodakarim S, Hadavi M. Classification and Diagnostic Prediction of Colorectal Cancer Mortality Based on Machine Learning Algorithms: A Multicenter National Study. Asian Pac J Cancer Prev 2024; 25:333-342. [PMID: 38285801 PMCID: PMC10911721 DOI: 10.31557/apjcp.2024.25.1.333] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2023] [Accepted: 01/19/2024] [Indexed: 01/31/2024] Open
Abstract
INTRODUCTION Colorectal cancer (CRC) ranks as the second leading cause of cancer-related deaths. This study aimed to predict survival outcomes of CRC patients using machine learning (ML) methods. MATERIAL AND METHODS A retrospective analysis included 1853 CRC patients admitted to three prominent tertiary hospitals in Iran from October 2006 to July 2019. Six ML methods, namely logistic regression (LR), Naïve Bayes (NB), Support Vector Machine (SVM), Neural Network (NN), Decision Tree (DT), and Light Gradient Boosting Machine (LGBM), were developed with 10-fold cross-validation. Feature selection employed the Random Forest method based on mean decrease GINI criteria. Model performance was assessed using Area Under the Curve (AUC). RESULTS Time from diagnosis, age, tumor size, metastatic status, lymph node involvement, and treatment type emerged as crucial predictors of survival based on mean decrease GINI. The NB (AUC = 0.70, 95% Confidence Interval [CI] 0.65-0.75) and LGBM (AUC = 0.70, 95% CI 0.65-0.75) models achieved the highest predictive AUC values for CRC patient survival. CONCLUSIONS This study highlights the significance of variables including time from diagnosis, age, tumor size, metastatic status, lymph node involvement, and treatment type in predicting CRC survival. The NB model exhibited optimal efficacy in mortality prediction, maintaining a balanced sensitivity and specificity. Policy recommendations encompass early diagnosis and treatment initiation for CRC patients, improved data collection through digital health records and standardized protocols, support for predictive analytics integration in clinical decisions, and the inclusion of identified prognostic variables in treatment guidelines to enhance patient outcomes.
Collapse
Affiliation(s)
- Gohar Mohammadi
- Cancer Research Center, Shahid Beheshti University of Medical Sciences, Tehran, Iran.
| | - Mehdi Azizmohammad Looha
- Basic and Molecular Epidemiology of Gastrointestinal Disorders Research Center, Research Institute for Gastroenterology and Liver Diseases, Shahid Beheshti University of Medical Sciences, Tehran, Iran.
| | - Mohammad Amin Pourhoseingholi
- Gastroenterology and Liver Diseases Research Center, Research Institute for Gastroenterology and Liver Diseases, Shahid Beheshti University of Medical Sciences, Tehran, Iran.
| | | | - Samaneh Sohrabi
- Vice Chancellor in Administration and Resources Development Affairs, Shahid Beheshti University of Medical Sciences, Tehran, Iran.
| | - Amirali Zareie Shab Khaneh
- Department of Epidemiology and Biostatistics, School of Public Health, Tehran University of Medical Sciences, Tehran, Iran.
| | - Hassan Piri
- Shahid Beheshti University of Medical Sciences, Tehran, Iran.
| | - Maryam Alaei
- Cardiovascular Research Center, Shahid Beheshti University of Medical Sciences, Tehran, Iran.
| | - Naser Parvani
- Vice Chancellor in Administration and Resources Development Affairs, Shahid Beheshti University of Medical Sciences, Tehran, Iran.
| | - Iman Vakilzadeh
- Vice Chancellor in Administration and Resources Development Affairs, Shahid Beheshti University of Medical Sciences, Tehran, Iran.
| | - Sara javadi
- Vice Chancellor for Research & Technology, Shiraz University of Medical Sciences, Shiraz, Iran.
| | | | - Zahra Razzaghi
- Laser Application in Medical Sciences Research Center. Shahid Beheshti University of Medical Sciences, Tehran, Iran.
| | - Reza Mahmoud Robati
- Department of Dermatology, Director of Skin Research Center, Shahid Beheshti University of Medical Sciences, Tehran, Iran.
| | - Mona Zamanian Azodi
- Proteomics Research Center, Shahid Beheshti University of Medical Sciences, Tehran, Iran.
| | - Saba Zarean Shahraki
- Department of Health Information Technology and Management, School of Allied Medical Sciences, Shahid Beheshti University of Medical Sciences, Tehran, Iran.
| | - Raheleh Talebi
- Department of Mathematics at Architecture and Computer Engineering, University of Applied Sciences (unit 10), Tehran, Iran.
| | | | - Mohammad Esmaeil Motlagh
- Department of Pediatrics, Faculty of Medicine, Ahvaz Jundishapur University of Medical Sciences, Ahvaz, Iran.
| | - Soheila Khodakarim
- Department of Biostatistics, School of Medicine, Shiraz University of Medical Sciences, Shiraz, Iran.
| | - Melika Hadavi
- Department of Epidemiology and Biostatistics, School of Public Health, Tehran University of Medical Sciences, Tehran, Iran.
| |
Collapse
|
13
|
Guo C, Pan J, Tian S, Gao Y. Using machine learning algorithms to predict 28-day mortality in critically ill elderly patients with colorectal cancer. J Int Med Res 2023; 51:3000605231198725. [PMID: 37950672 PMCID: PMC10640810 DOI: 10.1177/03000605231198725] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2023] [Accepted: 08/16/2023] [Indexed: 11/13/2023] Open
Abstract
OBJECTIVE To predict the 28-day mortality of critically ill, elderly patients with colorectal cancer (CRC) using five machine learning approaches. METHODS Data were extracted from the eICU Collaborative Research Database (eICU-CRD) (version 2.0) for a training cohort and from the Medical Information Mart for Intensive Care-IV (MIMIC-IV) and Wuhan Union hospital for validation cohorts. Clinical information (i.e., demographics; initial laboratory tests; vital signs; outcomes) were collected. Five machine learning algorithms (LightGBM, decision tree, XGBoost, random forest, and ensemble model) and a logistic regression were applied for the prediction of 28-day mortality. RESULTS Overall, 693 patients were included from the eICU cohort, 181 patients from the MIMIC-IV cohort and 95 from the Wuhan Union cohort. Among the six machine learning models, the ensemble model exhibited the best predictive ability (AUC, 0.86), followed by random forest (AUC, 0.83) and LightGBM (AUC, 0.82) in the training cohort. The models also obtained the good predictive performance for the 28-day mortality in the validation cohorts. CONCLUSIONS We showed that machine learning algorithms can be used for the 28-day mortality prediction in critically ill, elderly patients with CRC.
Collapse
Affiliation(s)
- Chunxia Guo
- Department of Infectious Disease, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, 430022, China
| | - Jun Pan
- Department of Gastroenterology, Taihe Hospital, Hubei University of Medicine, Shiyan, Hubei 442000, China
| | - Shan Tian
- Department of Infectious Disease, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, 430022, China
| | - Yuanjun Gao
- Department of Gastroenterology, Taihe Hospital, Hubei University of Medicine, Shiyan, Hubei 442000, China
| |
Collapse
|
14
|
Zhang F, Xu B, Peng Y, Mao Z. Clinicopathologic and prognostic factors of patients with T3/T4 colorectal signet ring cell carcinoma: a population-based study. J Cancer Res Clin Oncol 2023; 149:9747-9756. [PMID: 37245170 PMCID: PMC10423144 DOI: 10.1007/s00432-023-04880-2] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2023] [Accepted: 05/19/2023] [Indexed: 05/29/2023]
Abstract
BACKGROUND To evaluate cancer-specific survival (CSS) and construct a nomogram to predict the CSS of patients with colorectal signet ring cell carcinoma (SRCC). METHODS The data for patients with colorectal SRCC from 2000 to 2019 was identified from Surveillance, Epidemiology, and End Results (SEER) database. Propensity Score Matching (PSM) was used to minimize bias between SRCC and adenocarcinoma patients. Kaplan-Meier method and log-rank test were used to estimate the CSS. A nomogram was constructed based on the independent prognostic factors identified by univariate and multivariate Cox proportional hazards regression analyses. The model was evaluated by receiver operating characteristic (ROC) curves and calibration plots. RESULTS Poor CSS was more common in patients with colorectal SRCC, especially in patients with T4/N2 stage, tumor size > 80 mm, grade III-IV, and chemotherapy. Age, T/N stage, and tumor size > 80 mm were identified as independent prognostic indicators. And a prognostic nomogram was constructed and validated as an accurate model for the CSS of patients with colorectal SRCC by ROC curves and calibration plots. CONCLUSION Patients with colorectal SRCC have a poor prognosis. And the nomogram was expected to be effective in predicting the survival of patients with colorectal SRCC.
Collapse
Affiliation(s)
- Fan Zhang
- Department of General Surgery, The First Affiliated Hospital of Soochow University, Suzhou, China
| | - Boqi Xu
- Department of General Surgery, The First Affiliated Hospital of Soochow University, Suzhou, China
| | - Yao Peng
- Department of General Surgery, The First Affiliated Hospital of Soochow University, Suzhou, China
| | - Zhongqi Mao
- Department of General Surgery, The First Affiliated Hospital of Soochow University, Suzhou, China.
| |
Collapse
|
15
|
Miyagawa T, Saga M, Sasaki M, Shimizu M, Yamaura A. Statistical and machine learning approaches to predict the necessity for computed tomography in children with mild traumatic brain injury. PLoS One 2023; 18:e0278562. [PMID: 36595496 PMCID: PMC9810188 DOI: 10.1371/journal.pone.0278562] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2021] [Accepted: 11/18/2022] [Indexed: 01/04/2023] Open
Abstract
BACKGROUND Minor head trauma in children is a common reason for emergency department visits, but the risk of traumatic brain injury (TBI) in those children is very low. Therefore, physicians should consider the indication for computed tomography (CT) to avoid unnecessary radiation exposure to children. The purpose of this study was to statistically assess the differences between control and mild TBI (mTBI). In addition, we also investigate the feasibility of machine learning (ML) to predict the necessity of CT scans in children with mTBI. METHODS AND FINDINGS The study enrolled 1100 children under the age of 2 years to assess pre-verbal children. Other inclusion and exclusion criteria were per the PECARN study. Data such as demographics, injury details, medical history, and neurological assessment were used for statistical evaluation and creation of the ML algorithm. The number of children with clinically important TBI (ciTBI), mTBI on CT, and controls was 28, 30, and 1042, respectively. Statistical significance between the control group and clinically significant TBI requiring hospitalization (csTBI: ciTBI+mTBI on CT) was demonstrated for all nonparametric predictors except severity of the injury mechanism. The comparison between the three groups also showed significance for all predictors (p<0.05). This study showed that supervised ML for predicting the need for CT scan can be generated with 95% accuracy. It also revealed the significance of each predictor in the decision tree, especially the "days of life." CONCLUSIONS These results confirm the role and importance of each of the predictors mentioned in the PECARN study and show that ML could discriminate between children with csTBI and the control group.
Collapse
Affiliation(s)
- Tadashi Miyagawa
- Department of Pediatric Neurosurgery, Matsudo City General Hospital, Matsudo, Japan
- * E-mail:
| | - Marina Saga
- Department of Neurosurgery, Matsudo City General Hospital, Matsudo, Japan
| | - Minami Sasaki
- Department of Neurosurgery, Matsudo City General Hospital, Matsudo, Japan
| | - Miyuki Shimizu
- Department of Neurosurgery, Matsudo City General Hospital, Matsudo, Japan
| | - Akira Yamaura
- Department of Neurosurgery, Matsudo City General Hospital, Matsudo, Japan
| |
Collapse
|
16
|
Huang T, Le D, Yuan L, Xu S, Peng X. Machine learning for prediction of in-hospital mortality in lung cancer patients admitted to intensive care unit. PLoS One 2023; 18:e0280606. [PMID: 36701342 PMCID: PMC9879439 DOI: 10.1371/journal.pone.0280606] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2022] [Accepted: 01/04/2023] [Indexed: 01/27/2023] Open
Abstract
BACKGROUNDS The in-hospital mortality in lung cancer patients admitted to intensive care unit (ICU) is extremely high. This study intended to adopt machine learning algorithm models to predict in-hospital mortality of critically ill lung cancer for providing relative information in clinical decision-making. METHODS Data were extracted from the Medical Information Mart for Intensive Care-IV (MIMIC-IV) for a training cohort and data extracted from the Medical Information Mart for eICU Collaborative Research Database (eICU-CRD) database for a validation cohort. Logistic regression, random forest, decision tree, light gradient boosting machine (LightGBM), eXtreme gradient boosting (XGBoost), and an ensemble (random forest+LightGBM+XGBoost) model were used for prediction of in-hospital mortality and important feature extraction. The AUC (area under receiver operating curve), accuracy, F1 score and recall were used to evaluate the predictive performance of each model. Shapley Additive exPlanations (SHAP) values were calculated to evaluate feature importance of each feature. RESULTS Overall, there were 653 (24.8%) in-hospital mortality in the training cohort, and 523 (21.7%) in-hospital mortality in the validation cohort. Among the six machine learning models, the ensemble model achieved the best performance. The top 5 most influential features were the sequential organ failure assessment (SOFA) score, albumin, the oxford acute severity of illness score (OASIS) score, anion gap and bilirubin in random forest and XGBoost model. The SHAP summary plot was used to illustrate the positive or negative effects of the top 15 features attributed to the XGBoost model. CONCLUSION The ensemble model performed best and might be applied to forecast in-hospital mortality of critically ill lung cancer patients, and the SOFA score was the most important feature in all models. These results might offer valuable and significant reference for ICU clinicians' decision-making in advance.
Collapse
Affiliation(s)
- Tianzhi Huang
- Department of Rehabilitation, The Second Affiliated Hospital of Jianghan University, Wuhan, China
| | - Dejin Le
- Department of Respiratory Medicine, People’s Hospital of Daye, The Second Affiliated Hospital of Hubei Polytechnic University, Daye, Hubei, China
| | - Lili Yuan
- Department of Anesthesiology, The Second Affiliated Hospital of Jianghan University, Wuhan, China
| | - Shoujia Xu
- Department of Orthopedics, Sinopharm Dongfeng General Hospital, Hubei University of Medicine, Shiyan, Hubei, China
- * E-mail: (XP); (SX)
| | - Xiulan Peng
- Department of Oncology, The Second Affiliated Hospital of Jianghan University, Wuhan, China
- * E-mail: (XP); (SX)
| |
Collapse
|
17
|
Gong X, Zheng B, Xu G, Chen H, Chen C. Application of machine learning approaches to predict the 5-year survival status of patients with esophageal cancer. J Thorac Dis 2022; 13:6240-6251. [PMID: 34992804 PMCID: PMC8662490 DOI: 10.21037/jtd-21-1107] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2021] [Accepted: 09/24/2021] [Indexed: 01/15/2023]
Abstract
Background Accurate prognostic estimation for esophageal cancer (EC) patients plays an important role in the process of clinical decision-making. The objective of this study was to develop an effective model to predict the 5-year survival status of EC patients using machine learning (ML) algorithms. Methods We retrieved the information of patients diagnosed with EC between 2010 and 2015 from the Surveillance, Epidemiology, and End Results (SEER) Program, including 24 features. A total of 8 ML models were applied to the selected dataset to classify the EC patients in terms of 5-year survival status, including 3 newly developed gradient boosting models (GBM), XGBoost, CatBoost, and LightGBM, 2 commonly used tree-based models, gradient boosting decision trees (GBDT) and random forest (RF), and 3 other ML models, artificial neural networks (ANN), naive Bayes (NB), and support vector machines (SVM). A 5-fold cross-validation was used in model performance measurement. Results After excluding records with missing data, the final study population comprised 10,588 patients. Feature selection was conducted based on the χ2 test, however, the experiment results showed that the complete dataset provided better prediction of outcomes than the dataset with removal of non-significant features. Among the 8 models, XGBoost had the best performance [area under the receiver operating characteristic (ROC) curve (AUC): 0.852 for XGBoost, 0.849 for CatBoost, 0.850 for LightGBM, 0.846 for GBDT, 0.838 for RF, 0.844 for ANN, 0.833 for NB, and 0.789 for SVM]. The accuracy and logistic loss of XGBoost were 0.875 and 0.301, respectively, which were also the best performances. In the XGBoost model, the SHapley Additive exPlanations (SHAP) value was calculated and the result indicated that the four features: reason no cancer-directed surgery, Surg Prim Site, age, and stage group had the greatest impact on predicting the outcomes. Conclusions The XGBoost model and the complete dataset can be used to construct an accurate prognostic model for patients diagnosed with EC which may be applicable in clinical practice in the future.
Collapse
Affiliation(s)
- Xian Gong
- Department of Thoracic Surgery, Fujian Medical University Union Hospital, Fuzhou, China.,Key Laboratory of Cardio-Thoracic Surgery (Fujian Medical University), Fujian Province University, Fuzhou, China
| | - Bin Zheng
- Department of Thoracic Surgery, Fujian Medical University Union Hospital, Fuzhou, China.,Key Laboratory of Cardio-Thoracic Surgery (Fujian Medical University), Fujian Province University, Fuzhou, China
| | - Guobing Xu
- Department of Thoracic Surgery, Fujian Medical University Union Hospital, Fuzhou, China.,Key Laboratory of Cardio-Thoracic Surgery (Fujian Medical University), Fujian Province University, Fuzhou, China
| | - Hao Chen
- Department of Thoracic Surgery, Fujian Medical University Union Hospital, Fuzhou, China.,Key Laboratory of Cardio-Thoracic Surgery (Fujian Medical University), Fujian Province University, Fuzhou, China
| | - Chun Chen
- Department of Thoracic Surgery, Fujian Medical University Union Hospital, Fuzhou, China.,Key Laboratory of Cardio-Thoracic Surgery (Fujian Medical University), Fujian Province University, Fuzhou, China
| |
Collapse
|