1
|
Peimankar A, Garvik OS, Nørgård BM, Søndergaard J, Jarbøl DE, Wehberg S, Sheikh SP, Ebrahimi A, Wiil UK, Iachina M. Prescription data and demographics: An explainable machine learning exploration of colorectal cancer risk factors based on data from Danish national registries. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2025; 267:108774. [PMID: 40287990 DOI: 10.1016/j.cmpb.2025.108774] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/22/2024] [Revised: 02/23/2025] [Accepted: 04/10/2025] [Indexed: 04/29/2025]
Abstract
OBJECTIVES Despite substantial advancements in both treatment and prevention, colorectal cancer continues to be a leading cause of global morbidity and mortality. This study investigated the potential of using demographics and prescribed drug information to predict risk of colorectal cancer using a machine learning approach. METHODS Five different machine learning algorithms, including Logistic Regression, XGBoost, Random Forests, kNN, and Voting Classifier, were initially developed and evaluated for their predictive capabilities across various time horizons (3, 6, 12, and 36 months). To enhance transparency and interpretability, explainable techniques were employed to understand the model's predictions and identify the relative contributions of factors like age, sex, social status, and prescribed medications, promoting trust and clinical insights. While all developed models, including simpler ones such as Logistic Regression, demonstrated comparable performance, the Voting Classifier, as an ensemble model, was selected for further investigation due to its inherent diversity and generalizability. This ensemble model combines predictions from multiple base models, reducing the risk of overfitting and improving the robustness of the final prediction. RESULTS The model demonstrated consistent performance across these time horizons, achieving a precision consistently above 0.99, indicating high ability in identifying patients at risk. However, the recall remained relatively low (around 0.6), highlighting the model's limitations in comprehensively identifying all at risk patients, despite its high precision. This suggests additional investigations in future studies to further enhance the performance of the proposed model. CONCLUSION Machine learning models can identify individuals at higher risk for developing colorectal cancer, enabling earlier interventions and personalized risk management strategies. However, further studies are needed before implementation in clinical practice.
Collapse
Affiliation(s)
- Abdolrahman Peimankar
- SDU Health Informatics and Technology, The Maersk Mc-Kinney Moller Institute, University of Southern Denmark, 5230 Odense, Denmark.
| | - Olav Sivertsen Garvik
- Center for Clinical Epidemiology, Odense University Hospital, 5230 Odense, Denmark; Research Unit of Clinical Epidemiology, University of Southern Denmark, 5230 Odense, Denmark
| | - Bente Mertz Nørgård
- Center for Clinical Epidemiology, Odense University Hospital, 5230 Odense, Denmark; Research Unit of Clinical Epidemiology, University of Southern Denmark, 5230 Odense, Denmark
| | - Jens Søndergaard
- Research Unit of General Practice, Department of Public Health, University of Southern Denmark, 5230 Odense, Denmark
| | - Dorte Ejg Jarbøl
- Research Unit of General Practice, Department of Public Health, University of Southern Denmark, 5230 Odense, Denmark
| | - Sonja Wehberg
- Research Unit of General Practice, Department of Public Health, University of Southern Denmark, 5230 Odense, Denmark
| | - Søren Paludan Sheikh
- Center for Regenerative Medication, Odense University Hospital, 5230 Odense, Denmark
| | - Ali Ebrahimi
- SDU Health Informatics and Technology, The Maersk Mc-Kinney Moller Institute, University of Southern Denmark, 5230 Odense, Denmark
| | - Uffe Kock Wiil
- SDU Health Informatics and Technology, The Maersk Mc-Kinney Moller Institute, University of Southern Denmark, 5230 Odense, Denmark
| | - Maria Iachina
- Center for Clinical Epidemiology, Odense University Hospital, 5230 Odense, Denmark; Research Unit of Clinical Epidemiology, University of Southern Denmark, 5230 Odense, Denmark
| |
Collapse
|
2
|
Shweikeh F, Zeng Y, Jabir AR, Whittenberger E, Kadatane SP, Huang Y, Mouchli M, Castillo DR. The emerging role of blood-based biomarkers in early detection of colorectal cancer: A systematic review. Cancer Treat Res Commun 2025; 42:100872. [PMID: 39892077 DOI: 10.1016/j.ctarc.2025.100872] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2024] [Revised: 11/03/2024] [Accepted: 01/22/2025] [Indexed: 02/03/2025]
Abstract
BACKGROUND Colorectal cancer (CRC), the third most commonly diagnosed and second most lethal cancer worldwide, necessitates efficient early detection strategies to improve patient outcomes. This review evaluates the promise of novel blood-based biomarkers for early detection of CRC. METHODS A systematic review, registered with PROSPERO (CRD42024513770) and adhering to PRISMA guidelines, was conducted across multiple databases from January 1st, 2020 to December 31st, 2022. The comprehensive search strategy centered on sensitivity, specificity, and AUC-ROC of multiple types of molecular blood biomarkers. RESULTS Of total of 142 included articles, 59 were on protein, 58 on RNA, and 21 on DNA. The investigation into DNA biomarkers revealed that cfDNA and ctDNA carry significant potential for early CRC diagnosis. For instance, methylation patterns in genes such as MYO1-G and NDRG4 exhibited high diagnostic accuracies with AUCs reaching up to 0.996. RNA biomarkers like miRNAs and circRNAs also showed promising results, with circ_0011536 achieving AUCs of 0.982. Protein biomarkers, contrasted with established cancer markers, unveiled notable candidates like Irisin and ANXA2, with AUCs surpassing 0.96. The review highlights several individual markers and panels with the potential to improve upon existing CRC screening tests. CONCLUSIONS Despite the promise shown by the novel biomarkers, challenges persist, including small sample sizes, potential selection biases, and a lack of comprehensive cost-effectiveness analysis. Future research should focus on large-scale, multicenter, prospective studies across diverse populations. The findings advocate for an integrated biomarker approach, potentially revolutionizing CRC screening and aligning it with clinical realities through rigorous validation.
Collapse
Affiliation(s)
- Faris Shweikeh
- Department of Internal Medicine, Cleveland Clinic Akron General, OH, USA
| | - Yuhao Zeng
- Department of Internal Medicine, Cleveland Clinic Akron General, OH, USA
| | - Abdur Rahman Jabir
- Department of Internal Medicine, Wake Forest University School of Medicine, Winston-Salem, NC, USA
| | | | - Saurav P Kadatane
- Department of Internal Medicine, University of Oklahoma Health Sciences Center, Oklahoma City, OK, USA
| | - Yuting Huang
- Division of Gastroenterology and Hepatology, Mayo Clinic, Jacksonville, FL, USA
| | - Mohamad Mouchli
- Department of Gastroenterology, Hepatology and Nutrition, Digestive Disease and Surgery Institute, Cleveland Clinic Foundation, Cleveland, OH, USA
| | - Dani Ran Castillo
- Department of Hematology and Oncology, City of Hope Comprehensive Cancer Center, Duarte, CA, USA.
| |
Collapse
|
3
|
Rai HM, Yoo J, Dashkevych S. Transformative Advances in AI for Precise Cancer Detection: A Comprehensive Review of Non-Invasive Techniques. ARCHIVES OF COMPUTATIONAL METHODS IN ENGINEERING 2025. [DOI: 10.1007/s11831-024-10219-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/07/2024] [Accepted: 12/07/2024] [Indexed: 03/02/2025]
|
4
|
Rai HM, Yoo J, Razaque A. Comparative analysis of machine learning and deep learning models for improved cancer detection: A comprehensive review of recent advancements in diagnostic techniques. EXPERT SYSTEMS WITH APPLICATIONS 2024; 255:124838. [DOI: 10.1016/j.eswa.2024.124838] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/11/2024]
|
5
|
Vizza P, Aracri F, Guzzi PH, Gaspari M, Veltri P, Tradigo G. Machine learning pipeline to analyze clinical and proteomics data: experiences on a prostate cancer case. BMC Med Inform Decis Mak 2024; 24:93. [PMID: 38584282 PMCID: PMC11000316 DOI: 10.1186/s12911-024-02491-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2024] [Accepted: 03/25/2024] [Indexed: 04/09/2024] Open
Abstract
Proteomic-based analysis is used to identify biomarkers in blood samples and tissues. Data produced by devices such as mass spectrometry requires platforms to identify and quantify proteins (or peptides). Clinical information can be related to mass spectrometry data to identify diseases at an early stage. Machine learning techniques can be used to support physicians and biologists in studying and classifying pathologies. We present the application of machine learning techniques to define a pipeline aimed at studying and classifying proteomics data enriched using clinical information. The pipeline allows users to relate established blood biomarkers with clinical parameters and proteomics data. The proposed pipeline entails three main phases: (i) feature selection, (ii) models training, and (iii) models ensembling. We report the experience of applying such a pipeline to prostate-related diseases. Models have been trained on several biological datasets. We report experimental results about two datasets that result from the integration of clinical and mass spectrometry-based data in the contexts of serum and urine analysis. The pipeline receives input data from blood analytes, tissue samples, proteomic analysis, and urine biomarkers. It then trains different models for feature selection, classification and voting. The presented pipeline has been applied on two datasets obtained in a 2 years research project which aimed to extract hidden information from mass spectrometry, serum, and urine samples from hundreds of patients. We report results on analyzing prostate datasets serum with 143 samples, including 79 PCa and 84 BPH patients, and an urine dataset with 121 samples, including 67 PCa and 54 BPH patients. As results pipeline allowed to identify interesting peptides in the two datasets, 6 for the first one and 2 for the second one. The best model for both serum (AUC=0.87, Accuracy=0.83, F1=0.81, Sensitivity=0.84, Specificity=0.81) and urine (AUC=0.88, Accuracy=0.83, F1=0.83, Sensitivity=0.85, Specificity=0.80) datasets showed good predictive performances. We made the pipeline code available on GitHub and we are confident that it will be successfully adopted in similar clinical setups.
Collapse
Affiliation(s)
- Patrizia Vizza
- Department of Surgical and Medical Sciences, Magna Græcia University, 88100, Catanzaro, Italy
| | - Federica Aracri
- Department of Surgical and Medical Sciences, Magna Græcia University, 88100, Catanzaro, Italy.
| | - Pietro Hiram Guzzi
- Department of Surgical and Medical Sciences, Magna Græcia University, 88100, Catanzaro, Italy
| | - Marco Gaspari
- Department of Experimental and Clinical Medicine, Magna Græcia University, 88100, Catanzaro, Italy
| | - Pierangelo Veltri
- Department of Computers, Modeling, Electronics and Systems Engineering, University of Calabria, 87036, Rende, Italy
| | - Giuseppe Tradigo
- Department of Theoretical and Applied Sciences, eCampus University, 22060, Novedrate, CO, Italy
| |
Collapse
|
6
|
Qin S, Sun S, Wang Y, Li C, Fu L, Wu M, Yan J, Li W, Lv J, Chen L. Immune, metabolic landscapes of prognostic signatures for lung adenocarcinoma based on a novel deep learning framework. Sci Rep 2024; 14:527. [PMID: 38177198 PMCID: PMC10767103 DOI: 10.1038/s41598-023-51108-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2023] [Accepted: 12/30/2023] [Indexed: 01/06/2024] Open
Abstract
Lung adenocarcinoma (LUAD) is a malignant tumor with high lethality, and the aim of this study was to identify promising biomarkers for LUAD. Using the TCGA-LUAD dataset as a discovery cohort, a novel joint framework VAEjMLP based on variational autoencoder (VAE) and multilayer perceptron (MLP) was proposed. And the Shapley Additive Explanations (SHAP) method was introduced to evaluate the contribution of feature genes to the classification decision, which helped us to develop a biologically meaningful biomarker potential scoring algorithm. Nineteen potential biomarkers for LUAD were identified, which were involved in the regulation of immune and metabolic functions in LUAD. A prognostic risk model for LUAD was constructed by the biomarkers HLA-DRB1, SCGB1A1, and HLA-DRB5 screened by Cox regression analysis, dividing the patients into high-risk and low-risk groups. The prognostic risk model was validated with external datasets. The low-risk group was characterized by enrichment of immune pathways and higher immune infiltration compared to the high-risk group. While, the high-risk group was accompanied by an increase in metabolic pathway activity. There were significant differences between the high- and low-risk groups in metabolic reprogramming of aerobic glycolysis, amino acids, and lipids, as well as in angiogenic activity, epithelial-mesenchymal transition, tumorigenic cytokines, and inflammatory response. Furthermore, high-risk patients were more sensitive to Afatinib, Gefitinib, and Gemcitabine as predicted by the pRRophetic algorithm. This study provides prognostic signatures capable of revealing the immune and metabolic landscapes for LUAD, and may shed light on the identification of other cancer biomarkers.
Collapse
Affiliation(s)
- Shimei Qin
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150000, China
| | - Shibin Sun
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150000, China
| | - Yahui Wang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150000, China
| | - Chao Li
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150000, China
| | - Lei Fu
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150000, China
| | - Ming Wu
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150000, China
| | - Jinxing Yan
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150000, China
| | - Wan Li
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150000, China
| | - Junjie Lv
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150000, China.
| | - Lina Chen
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150000, China.
| |
Collapse
|
7
|
Rai HM. Cancer detection and segmentation using machine learning and deep learning techniques: a review. MULTIMEDIA TOOLS AND APPLICATIONS 2023; 83:27001-27035. [DOI: 10.1007/s11042-023-16520-5] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/16/2022] [Revised: 05/12/2023] [Accepted: 08/13/2023] [Indexed: 09/16/2023]
|