Abstract
BACKGROUND AND AIM
Colorectal cancer is among the most prevalent and deadliest cancers. Early prediction of metastasis in patients with colorectal cancer is crucial in preventing it from the advanced stages and enhancing the prognosis among these patients. So far, previous studies have been conducted to predict metastasis in colorectal cancer patients using clinical data. The current research attempts to leverage a combination of demographic, lifestyle, nutritional, and clinical factors, such as diagnostic and therapeutical factors, to construct an ML model with more predictive insights and generalizability than previous ones.
MATERIALS AND METHODS
In this retrospective study, we used 1156 CRC patients referred to the Masoud internal clinic in Tehran City from January 2017 to December 2023. The chosen machine learning algorithms, including LightGBM, XG-Boost, random forest, artificial neural network, support vector machine, decision tree, K-Nearest Neighbor and logistic regression, were utilized to establish prediction models for predicting metastasis among colorectal cancer patients. We also assessed features based on the best-performing model to improve clinical usability. To show the generalizability of the established prediction model for predicting CRC metastasis, we leveraged the data of 115 CRC patients from Imam Khomeini Hospital in Sari City. We assessed the predictive ability of LightGBM as the best-performing model based on external data.
RESULTS
The LightGBM model with a PPV of 97.32%, NPV of 84.67%, sensitivity of 83.14%, specificity of 93.14%, accuracy of 88.14%, F1-score of 87.51%, and an AU-ROC of 0.9 [Formula: see text]0.01 obtained satisfactory performance for prediction purposes on this topic. Factors including the history of IBD, family history of CRC, number of lymph nodes involved, fruit intake, and tumor size were considered as more strengthful predictors for metastasis in colorectal cancer and clinical usability. The external validation cohort showed a PPV of 0.8, NPV of 0.85, sensitivity of 0.78, specificity of 0.86, accuracy of 0.834, F1-score of 0.795, and AU-ROC of 0.77[Formula: see text]0.03, demonstrating satisfactory generalizability when leveraging external data from other clinical settings.
CONCLUSION
The current empirical results indicated that LighGBM has predictive competency that can be leveraged by physicians in clinical environments for early prediction of metastasis and enhanced prognosis in patients with colorectal cancer.
CLINICAL TRIAL NUMBER
Not applicable.
Collapse