1
|
Eledkawy A, Hamza T, El-Metwally S. Towards precision oncology: a multi-level cancer classification system integrating liquid biopsy and machine learning. BioData Min 2025; 18:29. [PMID: 40217526 PMCID: PMC11987386 DOI: 10.1186/s13040-025-00439-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2025] [Accepted: 03/10/2025] [Indexed: 04/14/2025] Open
Abstract
BACKGROUND Millions of people die from cancer every year. Early cancer detection is crucial for ensuring higher survival rates, as it provides an opportunity for timely medical interventions. This paper proposes a multi-level cancer classification system that uses plasma cfDNA/ctDNA mutations and protein biomarkers to identify seven distinct cancer types: colorectal, breast, upper gastrointestinal, lung, pancreas, ovarian, and liver. RESULTS The proposed system employs a multi-stage binary classification framework where each stage is customized for a specific cancer type. A majority vote feature selection process is employed by combining six feature selectors: Information Value, Chi-Square, Random Forest Feature Importance, Extra Tree Feature Importance, Recursive Feature Elimination, and L1 Regularization. Following the feature selection process, classifiers-including eXtreme Gradient Boosting, Random Forest, Extra Tree, and Quadratic Discriminant Analysis-are customized for each cancer type individually or in an ensemble soft voting setup to optimize predictive accuracy. The proposed system outperformed previously published results, achieving an AUC of 98.2% and an accuracy of 96.21%. To ensure reproducibility of the results, the trained models and the dataset used in this study are made publicly available via the GitHub repository ( https://github.com/SaraEl-Metwally/Towards-Precision-Oncology ). CONCLUSION The identified biomarkers enhance the interpretability of the diagnosis, facilitating more informed decision-making. The system's performance underscores its effectiveness in tissue localization, contributing to improved patient outcomes through timely medical interventions.
Collapse
Affiliation(s)
- Amr Eledkawy
- Department of Computer Science, Faculty of Computers and Information, Mansoura University, P.O. Box: 35516, Mansoura, Egypt
| | - Taher Hamza
- Department of Computer Science, Faculty of Computers and Information, Mansoura University, P.O. Box: 35516, Mansoura, Egypt
| | - Sara El-Metwally
- Department of Computer Science, Faculty of Computers and Information, Mansoura University, P.O. Box: 35516, Mansoura, Egypt.
- Biomedical Informatics Department, Faculty of Computer Science and Engineering, New Mansoura University, Gamasa, 35712, Egypt.
| |
Collapse
|
2
|
Jopek MA, Sieczczyński M, Pastuszak K, Łapińska-Szumczyk S, Jassem J, Żaczek AJ, Rondina MT, Supernat A. Impact of clinical factors on accuracy of ovarian cancer detection via platelet RNA profiling. Blood Adv 2025; 9:979-989. [PMID: 39715465 PMCID: PMC11907454 DOI: 10.1182/bloodadvances.2024014008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2024] [Revised: 12/05/2024] [Accepted: 12/05/2024] [Indexed: 12/25/2024] Open
Abstract
ABSTRACT Ovarian cancer (OC) presents a diagnostic challenge, often resulting in poor patient outcomes. Platelet RNA sequencing, which reflects host response to disease, shows promise for earlier OC detection. This study examines the impact of sex, age, platelet count, and the training on cancer types other than OC on classification accuracy achieved in the previous platelet-alone training data set. A total of 339 samples from healthy donors and 1396 samples from patients with cancer, spanning 18 cancer types (including 135 OC cases) were analyzed. Logistic regression was applied to verify our classifiers' performance and interpretability. Models were tested at 100% specificity and 100% sensitivity levels. Incorporating patient age as an additional feature along with gene expression increased sensitivity from 68.6% to 72.6%. Models trained on data from both sexes and on female-only data achieved a sensitivity of 68.6% and 74.5%, respectively. Training solely on OC data reduced late-stage sensitivity from 69.1% to 44.1% but increased early-stage sensitivity from 66.7% to 69.7%. This study highlights the potential of platelet RNA profiling for OC detection and the importance of clinical variables in refining classification accuracy. Incorporating age with gene expression data may enhance OC diagnostic accuracy. The inclusion of male samples deteriorates classifier performance. Data from diverse cancer types improves advanced cancer detection but negatively affects early-stage diagnosis.
Collapse
Affiliation(s)
- Maksym A. Jopek
- Laboratory of Translational Oncology, Intercollegiate Faculty of Biotechnology of the University of Gdańsk and the Medical University of Gdańsk, Gdańsk, Poland
- Centre of Biostatistics and Bioinformatics, Medical University of Gdańsk, Gdańsk, Poland
| | - Michał Sieczczyński
- Laboratory of Translational Oncology, Intercollegiate Faculty of Biotechnology of the University of Gdańsk and the Medical University of Gdańsk, Gdańsk, Poland
| | - Krzysztof Pastuszak
- Laboratory of Translational Oncology, Intercollegiate Faculty of Biotechnology of the University of Gdańsk and the Medical University of Gdańsk, Gdańsk, Poland
- Centre of Biostatistics and Bioinformatics, Medical University of Gdańsk, Gdańsk, Poland
- Department of Algorithms and Systems Modelling, Faculty of Electronics, Telecommunications and Informatics, Gdańsk University of Technology, Gdańsk, Poland
| | | | - Jacek Jassem
- Department of Oncology and Radiotherapy, Medical University of Gdańsk, Gdańsk, Poland
| | - Anna J. Żaczek
- Laboratory of Translational Oncology, Intercollegiate Faculty of Biotechnology of the University of Gdańsk and the Medical University of Gdańsk, Gdańsk, Poland
| | - Matthew T. Rondina
- Molecular Medicine Program, The University of Utah, Salt Lake City, UT
- Division of Hematology and Hematologic Malignancies, Department of Internal Medicine, The University of Utah and Huntsman Cancer Institute, Salt Lake City, UT
- George E. Wahlen Veterans Affairs Medical Center Department of Internal Medicine and the Geriatric Research Education and Clinical Center, Salt Lake City, UT
- Department of Pathology, The University of Utah, Salt Lake City, UT
| | - Anna Supernat
- Laboratory of Translational Oncology, Intercollegiate Faculty of Biotechnology of the University of Gdańsk and the Medical University of Gdańsk, Gdańsk, Poland
- Centre of Biostatistics and Bioinformatics, Medical University of Gdańsk, Gdańsk, Poland
| |
Collapse
|
3
|
Dubrovsky G, Ross A, Jalali P, Lotze M. Liquid Biopsy in Pancreatic Ductal Adenocarcinoma: A Review of Methods and Applications. Int J Mol Sci 2024; 25:11013. [PMID: 39456796 PMCID: PMC11507494 DOI: 10.3390/ijms252011013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2024] [Revised: 10/08/2024] [Accepted: 10/11/2024] [Indexed: 10/28/2024] Open
Abstract
Pancreatic ductal adenocarcinoma (PDAC) remains a malignancy with one of the highest mortality rates. One limitation in the diagnosis and treatment of PDAC is the lack of an early and universal biomarker. Extensive research performed recently to develop new assays which could fit this role is available. In this review, we will discuss the current landscape of liquid biopsy in patients with PDAC. Specifically, we will review the various methods of liquid biopsy, focusing on circulating tumor DNA (ctDNA) and exosomes and future opportunities for improvement using artificial intelligence or machine learning to analyze results from a multi-omic approach. We will also consider applications which have been evaluated, including the utility of liquid biopsy for screening and staging patients at diagnosis as well as before and after surgery. We will also examine the potential for liquid biopsy to monitor patient treatment response in the setting of clinical trial development.
Collapse
Affiliation(s)
- Genia Dubrovsky
- Department of Surgery, University of Pittsburgh, Pittsburgh, PA 15213, USA; (G.D.); (A.R.)
- Pittsburgh VA Medical Center, Pittsburgh, PA 15240, USA
| | - Alison Ross
- Department of Surgery, University of Pittsburgh, Pittsburgh, PA 15213, USA; (G.D.); (A.R.)
| | - Pooya Jalali
- Basic and Molecular Epidemiology of Gastrointestinal Disorders Research Centre, Research Institute for Gastroenterology and Liver Diseases, Shahid Beheshti University of Medical Sciences, Tehran 1983969411, Iran
| | - Michael Lotze
- Departments of Surgery, Immunology, and Bioengineering, University of Pittsburgh, Pittsburgh, PA 15213, USA
| |
Collapse
|
4
|
Cai Y, Luo M, Yang W, Xu C, Wang P, Xue G, Jin X, Cheng R, Que J, Zhou W, Pang B, Xu S, Li Y, Jiang Q, Xu Z. The Deep Learning Framework iCanTCR Enables Early Cancer Detection Using the T-cell Receptor Repertoire in Peripheral Blood. Cancer Res 2024; 84:1915-1928. [PMID: 38536129 DOI: 10.1158/0008-5472.can-23-0860] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2023] [Revised: 07/20/2023] [Accepted: 03/19/2024] [Indexed: 06/05/2024]
Abstract
T cells recognize tumor antigens and initiate an anticancer immune response in the very early stages of tumor development, and the antigen specificity of T cells is determined by the T-cell receptor (TCR). Therefore, monitoring changes in the TCR repertoire in peripheral blood may offer a strategy to detect various cancers at a relatively early stage. Here, we developed the deep learning framework iCanTCR to identify patients with cancer based on the TCR repertoire. The iCanTCR framework uses TCRβ sequences from an individual as an input and outputs the predicted cancer probability. The model was trained on over 2,000 publicly available TCR repertoires from 11 types of cancer and healthy controls. Analysis of several additional publicly available datasets validated the ability of iCanTCR to distinguish patients with cancer from noncancer individuals and demonstrated the capability of iCanTCR for the accurate classification of multiple cancers. Importantly, iCanTCR precisely identified individuals with early-stage cancer with an AUC of 86%. Altogether, this work provides a liquid biopsy approach to capture immune signals from peripheral blood for noninvasive cancer diagnosis. SIGNIFICANCE Development of a deep learning-based method for multicancer detection using the TCR repertoire in the peripheral blood establishes the potential of evaluating circulating immune signals for noninvasive early cancer detection.
Collapse
Affiliation(s)
- Yideng Cai
- School of Life Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Meng Luo
- School of Life Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Wenyi Yang
- School of Life Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Chang Xu
- School of Life Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Pingping Wang
- School for Interdisciplinary Medicine and Engineering, Harbin Medical University, Harbin, China
| | - Guangfu Xue
- School of Life Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Xiyun Jin
- School for Interdisciplinary Medicine and Engineering, Harbin Medical University, Harbin, China
| | - Rui Cheng
- School of Life Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Jinhao Que
- School of Life Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Wenyang Zhou
- School of Life Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Boran Pang
- Center for Difficult and Complicated Abdominal Surgery, Shanghai Tenth People's Hospital, Tongji University School of Medicine, Shanghai, China
| | - Shouping Xu
- Department of Breast Cancer, Harbin Medical University Cancer Hospital, Harbin, China
| | - Yu Li
- School of Life Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Qinghua Jiang
- School of Life Science and Technology, Harbin Institute of Technology, Harbin, China
- School for Interdisciplinary Medicine and Engineering, Harbin Medical University, Harbin, China
| | - Zhaochun Xu
- School for Interdisciplinary Medicine and Engineering, Harbin Medical University, Harbin, China
| |
Collapse
|
5
|
Eledkawy A, Hamza T, El-Metwally S. Precision cancer classification using liquid biopsy and advanced machine learning techniques. Sci Rep 2024; 14:5841. [PMID: 38462648 PMCID: PMC10925597 DOI: 10.1038/s41598-024-56419-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2023] [Accepted: 03/06/2024] [Indexed: 03/12/2024] Open
Abstract
Cancer presents a significant global health burden, resulting in millions of annual deaths. Timely detection is critical for improving survival rates, offering a crucial window for timely medical interventions. Liquid biopsy, analyzing genetic variations, and mutations in circulating cell-free, circulating tumor DNA (cfDNA/ctDNA) or molecular biomarkers, has emerged as a tool for early detection. This study focuses on cancer detection using mutations in plasma cfDNA/ctDNA and protein biomarker concentrations. The proposed system initially calculates the correlation coefficient to identify correlated features, while mutual information assesses each feature's relevance to the target variable, eliminating redundant features to improve efficiency. The eXtrem Gradient Boosting (XGBoost) feature importance method iteratively selects the top ten features, resulting in a 60% dataset dimensionality reduction. The Light Gradient Boosting Machine (LGBM) model is employed for classification, optimizing its performance through a random search for hyper-parameters. Final predictions are obtained by ensembling LGBM models from tenfold cross-validation, weighted by their respective balanced accuracy, and averaged to get final predictions. Applying this methodology, the proposed system achieves 99.45% accuracy and 99.95% AUC for detecting the presence of cancer while achieving 93.94% accuracy and 97.81% AUC for cancer-type classification. Our methodology leads to enhanced healthcare outcomes for cancer patients.
Collapse
Affiliation(s)
- Amr Eledkawy
- Department of Computer Science, Faculty of Computers and Information, Mansoura University, P.O. Box: 35516, Mansoura, Egypt
| | - Taher Hamza
- Department of Computer Science, Faculty of Computers and Information, Mansoura University, P.O. Box: 35516, Mansoura, Egypt
| | - Sara El-Metwally
- Department of Computer Science, Faculty of Computers and Information, Mansoura University, P.O. Box: 35516, Mansoura, Egypt.
- Biomedical Informatics Department, Faculty of Computer Science and Engineering, New Mansoura University, Gamasa, 35712, Egypt.
| |
Collapse
|
6
|
Markou Α, Londra D, Tserpeli V, Kollias Ι, Tsaroucha E, Vamvakaris I, Potaris K, Pateras I, Kotsakis Α, Georgoulias V, Lianidou Ε. DNA methylation analysis of tumor suppressor genes in liquid biopsy components of early stage NSCLC: a promising tool for early detection. Clin Epigenetics 2022; 14:61. [PMID: 35538556 PMCID: PMC9092693 DOI: 10.1186/s13148-022-01283-x] [Citation(s) in RCA: 22] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2022] [Accepted: 04/27/2022] [Indexed: 12/02/2022] Open
Abstract
Purpose Circulating tumor cells (CTCs) and circulating tumor DNA (ctDNA) analysis represents a liquid biopsy approach for real-time monitoring of tumor evolution. DNA methylation is considered to be an early event in the process of cancer development and progression. The aim of the present study was to evaluate whether detection of DNA methylation of selected tumor suppressor genes in CTC and matched ctDNA provides prognostic information in early stage NSCLC. Experimental design The methylation status of five selected gene promoters (APC, RASSFIA1, FOXA1, SLFN11, SHOX2) was examined by highly specific and sensitive real-time methylation specific PCR assays in: (a) a training group of 35 primary tumors and their corresponding adjacent non-cancerous tissues of early stage NSCLC patients, (b) a validation group of 22 primary tumor tissues (FFPEs) and 42 peripheral blood samples of early stage NSCLC patients. gDNA was isolated from FFPEs, CTCs (size-based enriched by Parsortix; Angle and plasma, and (c) a control group of healthy blood donors (n = 12). Results All five gene promoters tested were highly methylated in the training group; methylation of SHOX2 promoter in primary tumors was associated with unfavorable outcome. RASSFIA and APC were found methylated in plasma-cfDNA samples at 14.3% and 11.9%, respectively, whereas in the corresponding CTCs SLFN11 and APC promoters were methylated in 7.1%. The incidence of relapses was higher in patients with a) promoter methylation of APC and SLFN11 in plasma-cfDNA (P = 0.037 and P = 0.042 respectively) and b) at least one detected methylated gene promoter in CTC or plasma-cfDNA (P = 0.015). Conclusions DNA methylation of these five gene promoters was significantly lower in CTCs and plasma-cfDNA than in the primary tumors. Combination of DNA methylation analysis in CTC and plasma-cfDNA was associated with worse DFI of NSCLC patients. Additional studies are required to validate our findings in a large cohort of early stage NSCLC patients. Supplementary Information The online version contains supplementary material available at 10.1186/s13148-022-01283-x.
Collapse
Affiliation(s)
- Α Markou
- Analysis of Circulating Tumor Cells (ACTC) Lab, Laboratory of Analytical Chemistry, Department of Chemistry, National and Kapodistrian University of Athens, Athens, Greece.
| | - D Londra
- Analysis of Circulating Tumor Cells (ACTC) Lab, Laboratory of Analytical Chemistry, Department of Chemistry, National and Kapodistrian University of Athens, Athens, Greece
| | - V Tserpeli
- Analysis of Circulating Tumor Cells (ACTC) Lab, Laboratory of Analytical Chemistry, Department of Chemistry, National and Kapodistrian University of Athens, Athens, Greece
| | - Ι Kollias
- Analysis of Circulating Tumor Cells (ACTC) Lab, Laboratory of Analytical Chemistry, Department of Chemistry, National and Kapodistrian University of Athens, Athens, Greece
| | - E Tsaroucha
- 8th Department of Pulmonary Diseases, 'Sotiria' General Hospital for Chest Diseases, Athens, Greece
| | - I Vamvakaris
- 8th Department of Pulmonary Diseases, 'Sotiria' General Hospital for Chest Diseases, Athens, Greece
| | - K Potaris
- 8th Department of Pulmonary Diseases, 'Sotiria' General Hospital for Chest Diseases, Athens, Greece
| | - I Pateras
- Laboratory of Histology-Embryology, Molecular Carcinogenesis Group, Medical School, National and Kapodistrian University of Athens, Athens, Greece
| | - Α Kotsakis
- Department of Medical Oncology, University General Hospital of Larissa, Thessaly, Greece
| | - V Georgoulias
- First Department of Medical Oncology, Metropolitan General Hospital of Athens, Cholargos, Greece
| | - Ε Lianidou
- Analysis of Circulating Tumor Cells (ACTC) Lab, Laboratory of Analytical Chemistry, Department of Chemistry, National and Kapodistrian University of Athens, Athens, Greece
| |
Collapse
|