1
|
Gündoğdu H, Panç K, Sekmen S, Er H, Gürün E. Enhancing bone metastasis prediction in prostate cancer using quantitative mpMRI features, ISUP grade and PSA density: a machine learning approach. Abdom Radiol (NY) 2025; 50:2221-2231. [PMID: 39542946 DOI: 10.1007/s00261-024-04667-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2024] [Revised: 10/27/2024] [Accepted: 10/28/2024] [Indexed: 11/17/2024]
Abstract
PURPOSE Bone metastasis is a critical complication in prostate cancer, significantly impacting patient prognosis and quality of life. This study aims to enhance bone metastasis prediction using machine learning (ML) techniques by integrating dynamic contrast-enhanced magnetic resonance imaging (DCE-MRI) perfusion features, International Society of Urological Pathology (ISUP) grade, and prostate-specific antigen (PSA) density. MATERIALS AND METHODS A retrospective analysis was conducted on 122 patients with histopathologically confirmed prostate cancer who underwent multiparametric prostate magnetic resonance imaging (mpMRI). Quantitative mpMRI features, PSA density, and ISUP grades were extracted and normalized. The dataset was balanced using oversampling and divided into training (70%) and test (30%) sets. Various ML models were developed and evaluated using area under the curve (AUC) metrics. RESULTS Bone metastases were present in 26 patients (21.3%) at diagnosis. IAUGC and MaxSlope showed a statistically significant association with bone metastasis (p = 0.035, p = 0.050 respectively). The optimal PSA density cut-off value of 0.24 yielded a sensitivity of 0.88, specificity of 0.60, and AUC of 0.77. Machine learning models were developed using the dataset created with IAUGC, MaxSlope, ISUP grade, and PSA density values. Among the ML models, XGBoost demonstrated superior performance with validation and test AUCs of 91.5% and 92.6%, respectively, along with high precision (93.3%) and recall (93.1%). CONCLUSION Integrating quantitative mpMRI features, ISUP grade, and PSA density through machine learning algorithms, particularly XGBoost, significantly improves the accuracy of bone metastasis prediction in prostate cancer patients. This approach can potentially reduce the need for additional imaging modalities and associated radiation exposure.
Collapse
Affiliation(s)
| | - Kemal Panç
- Karakoçan State Hospital, Elazig, Turkey
| | | | - Hüseyin Er
- Recep Tayyip Erdoğan University, Rize, Turkey
| | | |
Collapse
|
2
|
Soleimani S, Bahrami M, Vali M. Survival prediction from imbalanced colorectal cancer dataset using hybrid sampling methods and tree-based classifiers. Sci Rep 2025; 15:14554. [PMID: 40281195 PMCID: PMC12032297 DOI: 10.1038/s41598-025-98703-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2024] [Accepted: 04/14/2025] [Indexed: 04/29/2025] Open
Abstract
Colorectal cancer is a high mortality cancer, with a mortality rate of 64.5% for all stages combined. Clinical data analysis plays a crucial role in predicting the survival of colorectal cancer patients, enabling clinicians to make informed treatment decisions. However, utilizing clinical data can be challenging, especially when dealing with imbalanced outcomes, an aspect often overlooked in this context. This paper focuses on developing algorithms to predict 1-, 3-, and 5-year survival of colorectal cancer patients using clinical datasets, with particular emphasis on the highly imbalanced 1-year survival prediction task. We utilized a colorectal cancer dataset from the Surveillance, Epidemiology, and End Results (SEER) database, which exhibits high imbalance in the 1-year (1:10) survival analysis and an imbalance in the 3-year (2:10) analysis, achieving balance in the 5-year analysis. The pre-processing step consists of removing records with missing values and merging categories with less than 2% share for each categorical feature to limit the number of classes of each component. Edited Nearest Neighbor, Repeated Edited Nearest Neighbor (RENN), Synthetic Minority Over-sampling Technique (SMOTE), and pipelines of SMOTE and RENN approaches were used for balancing the data with tree-based classifiers, including Decision Tree, Random Forest, Extra Tree, eXtreme Gradient Boosting, and Light Gradient Boosting Machine (LGBM). The performance evaluation utilizes a 5-fold cross-validation approach. In the case of 1-year, our proposed method with LGBM significantly outperforms other sampling methods with the sensitivity of 72.30%. For the task of 3-year survival, the combination of RENN and LGBM achieves a sensitivity of 80.81%, indicating that our proposed method works best for highly imbalanced datasets. Additionally, when predicting 5-year survival, the sensitivity reaches 63.03% using LGBM. Our proposed method significantly improves mortality prediction for the minority class of colorectal cancer patients. RENN followed by SMOTE yields better sensitivity in the classifiers, with LGBM as the predictor performing best for 1- and 3-year survival. In the 5-year task, LGBM outperforms other models in terms of F1-score.
Collapse
Affiliation(s)
- Sadegh Soleimani
- Department of Biomedical Engineering, Faculty of Electrical Engineering, K. N. Toosi University of Technology, 16315-1355, 1631714191, Tehran, Iran
| | - Mahsa Bahrami
- Department of Biomedical Engineering, Faculty of Electrical Engineering, K. N. Toosi University of Technology, 16315-1355, 1631714191, Tehran, Iran
| | - Mansour Vali
- Department of Biomedical Engineering, Faculty of Electrical Engineering, K. N. Toosi University of Technology, 16315-1355, 1631714191, Tehran, Iran.
| |
Collapse
|
3
|
Ahuja G, Kaur I, Lamba PS, Virmani D, Jain A, Chakraborty S, Mallik S. Prostate cancer prognosis using machine learning: A critical review of survival analysis methods. Pathol Res Pract 2024; 264:155687. [PMID: 39541766 DOI: 10.1016/j.prp.2024.155687] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/25/2024] [Accepted: 10/25/2024] [Indexed: 11/16/2024]
Abstract
Prostate Cancer is a disease that affects the male reproductive system. The irregularity of the symptoms makes it hard for the clinicians to pinpoint the disease in the earlier stages. Techniques such as Machine Learning, Data Science, Deep Learning, etc. have been employed on the biomedical data to identify the symptoms of the patients and predict their stage and the chances of their survival. The survival analysis of prostate cancer is essential as it guides the clinicians to recommend the optimal treatment for the patient. Building an accurate model from electronic data using machine learning is quite difficult. This review article presents a systematic literature review focused on the area of prostate cancer survival analysis utilizing machine learning and other soft computing techniques. Through an extensive evaluation of the available research, we have identified and summarized key insights from the selected studies. A comprehensive comparison of various approaches for survival and treatment predictions in the literature has been conducted. Additionally, the gaps in previous research have been discussed, highlighting areas for further investigation and providing future recommendations. By synthesizing the current knowledge in prostate cancer survival analysis, this review contributes to the understanding of the field and lays the foundation for future advancements.
Collapse
Affiliation(s)
- Garvita Ahuja
- Vivekananda Institute of Professional Studies, Technical Campus, New Delhi 110034, India.
| | - Ishleen Kaur
- Sri Guru Tegh Bahadur Khalsa College, University of Delhi, Delhi 110007, India.
| | - Puneet Singh Lamba
- Sri Guru Tegh Bahadur Khalsa College, University of Delhi, Delhi 110007, India.
| | - Deepali Virmani
- Department of IT Guru Tegh Bahadur Institute of Technology, India.
| | - Achin Jain
- Bharati Vidyapeeth's College of Engineering, New Delhi 110063, India.
| | - Somenath Chakraborty
- Department of Computer Science and Information Systems, The West Virginia University Institute of Technology, Beckley, WV, USA.
| | - Saurav Mallik
- Department of Environmental Health, Harvard T H Chan School of Public Health, Boston, MA 02115, USA; Department of Pharmacology & Toxicology, University of Arizona, Tucson, AZ 85721, USA.
| |
Collapse
|
4
|
Kuizinienė D, Savickas P, Kunickaitė R, Juozaitienė R, Damaševičius R, Maskeliūnas R, Krilavičius T. A comparative study of feature selection and feature extraction methods for financial distress identification. PeerJ Comput Sci 2024; 10:e1956. [PMID: 38855232 PMCID: PMC11157601 DOI: 10.7717/peerj-cs.1956] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2023] [Accepted: 03/04/2024] [Indexed: 06/11/2024]
Abstract
Financial distress identification remains an essential topic in the scientific literature due to its importance for society and the economy. The advancements in information technology and the escalating volume of stored data have led to the emergence of financial distress that transcends the realm of financial statements and its' indicators (ratios). The feature space could be expanded by incorporating new perspectives on feature data categories such as macroeconomics, sectors, social, board, management, judicial incident, etc. However, the increased dimensionality results in sparse data and overfitted models. This study proposes a new approach for efficient financial distress classification assessment by combining dimensionality reduction and machine learning techniques. The proposed framework aims to identify a subset of features leading to the minimization of the loss function describing the financial distress in an enterprise. During the study, 15 dimensionality reduction techniques with different numbers of features and 17 machine-learning models were compared. Overall, 1,432 experiments were performed using Lithuanian enterprise data covering the period from 2015 to 2022. Results revealed that the artificial neural network (ANN) model with 30 ranked features identified using the Random Forest mean decreasing Gini (RF_MDG) feature selection technique provided the highest AUC score. Moreover, this study has introduced a novel approach for feature extraction, which could improve financial distress classification models.
Collapse
Affiliation(s)
- Dovilė Kuizinienė
- Department of Applied Informatics, Vytautas Magnus University, Kaunas, Lithuania
| | - Paulius Savickas
- Department of Applied Informatics, Vytautas Magnus University, Kaunas, Lithuania
| | - Rimantė Kunickaitė
- Department of Applied Informatics, Vytautas Magnus University, Kaunas, Lithuania
| | - Rūta Juozaitienė
- Department of Applied Informatics, Vytautas Magnus University, Kaunas, Lithuania
| | | | | | - Tomas Krilavičius
- Department of Applied Informatics, Vytautas Magnus University, Kaunas, Lithuania
| |
Collapse
|
5
|
P D, C G. A systematic review on machine learning and deep learning techniques in cancer survival prediction. PROGRESS IN BIOPHYSICS AND MOLECULAR BIOLOGY 2022; 174:62-71. [PMID: 35933043 DOI: 10.1016/j.pbiomolbio.2022.07.004] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/07/2022] [Revised: 07/13/2022] [Accepted: 07/19/2022] [Indexed: 06/15/2023]
Abstract
Cancer is a disease which is characterised by the unusual and uncontrollable growth of body cells. This usually happens asymptomatically and gets spread to other parts of the body. The major problem in treating cancer is that its progress is not monitored once it is diagnosed. The progress or the prognosis can be done through survival analysis. The survival analysis is the branch of statistics that deals in predicting the time of event of occurrence. In the case of cancer prognosis the event is the survival time of the patient from the onset of the disease or it can be the recurrence of the disease after undergoing a treatment. This study aims to bring out the machine learning and deep learning models involved in providing the prognosis to the cancer patients.
Collapse
Affiliation(s)
- Deepa P
- School of Information Technology and Engineering, Vellore Institute of Technology, Vellore, India
| | - Gunavathi C
- School of Computer Science and Engineering, Vellore Institute of Technology, Vellore, India.
| |
Collapse
|
6
|
Parr H, Hall E, Porta N. Joint models for dynamic prediction in localised prostate cancer: a literature review. BMC Med Res Methodol 2022; 22:245. [PMID: 36123621 PMCID: PMC9487103 DOI: 10.1186/s12874-022-01709-3] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2021] [Accepted: 08/10/2022] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Prostate cancer is a very prevalent disease in men. Patients are monitored regularly during and after treatment with repeated assessment of prostate-specific antigen (PSA) levels. Prognosis of localised prostate cancer is generally good after treatment, and the risk of having a recurrence is usually estimated based on factors measured at diagnosis. Incorporating PSA measurements over time in a dynamic prediction joint model enables updates of patients' risk as new information becomes available. We review joint model strategies that have been applied to model time-dependent PSA trajectories to predict time-to-event outcomes in localised prostate cancer. METHODS We identify articles that developed joint models for prediction of localised prostate cancer recurrence over the last two decades. We report, compare, and summarise the methodological approaches and applications that use joint modelling accounting for two processes: the longitudinal model (PSA), and the time-to-event process (clinical failure). The methods explored differ in how they specify the association between these two processes. RESULTS Twelve relevant articles were identified. A range of methodological frameworks were found, and we describe in detail shared-parameter joint models (9 of 12, 75%) and joint latent class models (3 of 12, 25%). Within each framework, these articles presented model development, estimation of dynamic predictions and model validations. CONCLUSIONS Each framework has its unique principles with corresponding advantages and differing interpretations. Regardless of the framework used, dynamic prediction models enable real-time prediction of individual patient prognosis. They utilise all available longitudinal information, in addition to baseline prognostic risk factors, and are superior to traditional baseline-only prediction models.
Collapse
Affiliation(s)
- Harry Parr
- Clinical Trials and Statistics Unit at The Institute of Cancer Research, London, UK
| | - Emma Hall
- Clinical Trials and Statistics Unit at The Institute of Cancer Research, London, UK
| | - Nuria Porta
- Clinical Trials and Statistics Unit at The Institute of Cancer Research, London, UK
| |
Collapse
|
7
|
Liu L, Qiao C, Zha JR, Qin H, Wang XR, Zhang XY, Wang YO, Yang XM, Zhang SL, Qin J. Early prediction of clinical scores for left ventricular reverse remodeling using extreme gradient random forest, boosting, and logistic regression algorithm representations. Front Cardiovasc Med 2022; 9:864312. [PMID: 36061535 PMCID: PMC9428443 DOI: 10.3389/fcvm.2022.864312] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2022] [Accepted: 07/13/2022] [Indexed: 11/13/2022] Open
Abstract
Objective At present, there is no early prediction model of left ventricular reverse remodeling (LVRR) for people who are in cardiac arrest with an ejection fraction (EF) of ≤35% at first diagnosis; thus, the purpose of this article is to provide a supplement to existing research. Materials and methods A total of 109 patients suffering from heart attack with an EF of ≤35% at first diagnosis were involved in this single-center research study. LVRR was defined as an absolute increase in left ventricular ejection fraction (LVEF) from ≥10% to a final value of >35%, with analysis features including demographic characteristics, diseases, biochemical data, echocardiography, and drug therapy. Extreme gradient boosting (XGBoost), random forest, and logistic regression algorithm models were used to distinguish between LVRR and non-LVRR cases and to obtain the most important features. Results There were 47 cases (42%) of LVRR in patients suffering from heart failure with an EF of ≤35% at first diagnosis after optimal drug therapy. General statistical analysis and machine learning methods were combined to exclude a number of significant feature groups. The median duration of disease in the LVRR group was significantly lower than that in the non-LVRR group (7 vs. 48 months); the mean values of creatine kinase (CK) and MB isoenzyme of creatine kinase (CK-MB) in the LVRR group were lower than those in the non-LVRR group (80.11 vs. 94.23 U/L; 2.61 vs. 2.99 ng/ml; 27.19 vs. 28.54 mm). Moreover, AUC values for our feature combinations ranged from 97 to 94% and to 87% when using the XGBoost, random forest, and logistic regression techniques, respectively. The ablation test revealed that beats per minute (BPM) and disease duration had a greater impact on the model's ability to accurately forecast outcomes. Conclusion Shorter disease duration, slightly lower CK and CK-MB levels, slightly smaller right and left ventricular and left atrial dimensions, and lower mean heart rates were found to be most strongly predictive of LVRR development (BPM).
Collapse
Affiliation(s)
- Lu Liu
- Heart Centre, Affiliated Zhongshan Hospital of Dalian University, Dalian, China
| | - Cen Qiao
- Heart Centre, Affiliated Zhongshan Hospital of Dalian University, Dalian, China
| | - Jun-Ren Zha
- School of Software Engineering, Dalian University, Dalian, China
| | - Huan Qin
- Heart Centre, Affiliated Zhongshan Hospital of Dalian University, Dalian, China
| | - Xiao-Rui Wang
- Heart Centre, Affiliated Zhongshan Hospital of Dalian University, Dalian, China
| | - Xin-Yu Zhang
- Medical College, Dalian University, Dalian, China
| | - Yi-Ou Wang
- Heart Centre, Affiliated Zhongshan Hospital of Dalian University, Dalian, China
| | - Xiu-Mei Yang
- Heart Centre, Affiliated Zhongshan Hospital of Dalian University, Dalian, China
| | - Shu-Long Zhang
- Heart Centre, Affiliated Zhongshan Hospital of Dalian University, Dalian, China
| | - Jing Qin
- School of Software Engineering, Dalian University, Dalian, China
| |
Collapse
|