1
|
Du Y, Zhou X, Gao Q, Yang C, Huang T. A Deep Reinforcement Learning-Based Feature Selection Method for Invasive Disease Event Prediction Using Imbalanced Follow-Up Data. IEEE J Biomed Health Inform 2025; 29:1472-1483. [PMID: 40030195 DOI: 10.1109/jbhi.2024.3497325] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/06/2025]
Abstract
The machine learning-based model is a promising paradigm for predicting invasive disease events (iDEs) in breast cancer. Feature selection (FS) is an essential preprocessing technique employed to identify the pertinent features for the prediction model. However, conventional FS methods often fail with imbalanced clinical data due to the bias towards the majority class. In this paper, a novel FS framework based on reinforcement learning (RLFS) is developed to identify the optimal feature subset for the imbalanced data. The RLFS employs an iterative methodology, wherein data resampling technique generates a balanced dataset before each iteration. A decision network is trained using a deep RL algorithm to identify the relevant features for the dataset in the current iteration. With such an iterative training strategy, numerous constructed datasets gradually boost the FS capacity of the decision network, resulting in a robust performance for imbalanced data. Finally, a weighted model is proposed to determine the most suitable FS solution. The RLFS is employed to predict breast cancer iDEs using real follow-up data. The comparison results demonstrated that RLFS effectively reduces the number of features while outperforming several state-of-the-art FS algorithms.
Collapse
|
2
|
Fanizzi A, Bove S, Comes MC, Di Benedetto EF, Latorre A, Giotta F, Nardone A, Rizzo A, Soranno C, Zito A, Massafra R. Prediction of breast cancer Invasive Disease Events using transfer learning on clinical data as image-form. PLoS One 2024; 19:e0312036. [PMID: 39570983 PMCID: PMC11581389 DOI: 10.1371/journal.pone.0312036] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2024] [Accepted: 09/30/2024] [Indexed: 11/25/2024] Open
Abstract
BACKGROUND AND OBJECTIVE Detecting patients at high risk of occurrence of an Invasive Disease Event after a first diagnosis of breast cancer, such as recurrence, distant metastasis, contralateral tumor and second tumor, could support clinical decision-making processes in the treatment of this malignancy. Though several machine learning models analyzing both clinical and histopathological information have been developed in literature to address this task, these approaches turned out to be unsuitable for describing this problem. METHODS In this study, we designed a novel artificial intelligence-based approach which converts clinical information into an image-form to be analyzed through Convolutional Neural Networks. Specifically, we predicted the occurrence of an Invasive Disease Event at both 5-year and 10-year follow-ups of 696 female patients with a first invasive breast cancer diagnosis enrolled at IRCCS "Giovanni Paolo II" in Bari, Italy. After transforming each patient, represented by a vector of clinical information, to an image form, we extracted low-level quantitative imaging features by means of a pre-trained Convolutional Neural Network, namely, AlexNET. Then, we classified breast cancer patients in the two classes, namely, Invasive Disease Event and non-Invasive Disease Event, via a Support Vector Machine classifier trained on a subset of significative features previously identified. RESULTS Both 5-year and 10-year models resulted particularly accurate in predicting breast cancer recurrence event, achieving an AUC value of 92.07% and 92.84%, an accuracy of 88.71% and 88.82%, a sensitivity of 86.83% and 88.06%, a specificity of 89.55% and 89.3%, a precision of 71.93% and 84.82%, respectively. CONCLUSIONS This is the first study proposing an approach which converts clinical information into an image-form to develop a decision support system for identifying patients at high risk of occurrence of an Invasive Disease Event, and then defining personalized oncological therapeutic treatments for breast cancer patients.
Collapse
Affiliation(s)
| | - Samantha Bove
- I.R.C.C.S. Istituto Tumori “Giovanni Paolo II”, Bari, Italy
| | | | | | - Agnese Latorre
- I.R.C.C.S. Istituto Tumori “Giovanni Paolo II”, Bari, Italy
| | | | | | | | - Clara Soranno
- I.R.C.C.S. Istituto Tumori “Giovanni Paolo II”, Bari, Italy
| | - Alfredo Zito
- I.R.C.C.S. Istituto Tumori “Giovanni Paolo II”, Bari, Italy
| | | |
Collapse
|
3
|
Darbandi MR, Darbandi M, Darbandi S, Bado I, Hadizadeh M, Khorram Khorshid HR. Artificial intelligence breakthroughs in pioneering early diagnosis and precision treatment of breast cancer: A multimethod study. Eur J Cancer 2024; 209:114227. [PMID: 39053289 DOI: 10.1016/j.ejca.2024.114227] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2024] [Accepted: 07/07/2024] [Indexed: 07/27/2024]
Abstract
This article delves into the potential of artificial intelligence (AI) to enhance early breast cancer (BC) detection for improved treatment outcomes and patient care. Utilizing a multimethod approach comprising literature review and experiments, the study systematically reviewed 310 articles utilizing 30 diverse datasets. Among the techniques assessed, recurrent neural network (RNN) emerged as the most accurate, achieving 98.58 % accuracy, followed by genetic principles (GP), transfer learning (TL), and artificial neural networks (ANNs), with accuracies exceeding 96 %. While conventional machine learning (ML) methods demonstrated accuracies above 90 %, DL techniques outperformed them. Evaluation of BC diagnostic models using the Wisconsin breast cancer dataset (WBCD) highlighted logistic regression (LR) and support vector machine (SVM) as the most accurate predictors, with minimal errors for clinical data. Conversely, decision trees (DT) exhibited higher error rates due to overfitting, emphasizing the importance of algorithm selection for complex datasets. Analysis of ultrasound images underscored the significance of preprocessing, while histopathological image analysis using convolutional neural networks (CNNs) demonstrated robust classification capabilities. These findings underscore the transformative potential of ML and DL in BC diagnosis, offering automated, accurate, and accessible diagnostic tools. Collaboration among stakeholders is crucial for further advancements in BC detection methods.
Collapse
Affiliation(s)
| | - Mahsa Darbandi
- Fetal Health Research Center, Hope Generation Foundation, Tehran, Iran.
| | - Sara Darbandi
- Gene Therapy and Regenerative Medicine Research Center, Hope Generation Foundation, Tehran, Iran.
| | - Igor Bado
- Department of Oncological Sciences, Tisch Cancer Institute, New York, USA.
| | - Mohammad Hadizadeh
- Cancer Research Center, Shahid Beheshti University of Medical Sciences, Tehran, Iran.
| | - Hamid Reza Khorram Khorshid
- Genetics Research Center, University of Social Welfare and Rehabilitation Sciences, Tehran, Iran; Personalized Medicine and Genometabolics Research Center, Hope Generation Foundation, Tehran, Iran.
| |
Collapse
|
4
|
Narasimha V, T RR, Kadiyala R, Paritala C, Shariff V, Rakesh V. Assessing the Resilience of Machine Learning Models in Predicting Long-Term Breast Cancer Recurrence Results. 2024 8TH INTERNATIONAL CONFERENCE ON INVENTIVE SYSTEMS AND CONTROL (ICISC) 2024:416-422. [DOI: 10.1109/icisc62624.2024.00077] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/04/2025]
Affiliation(s)
- Vadthe Narasimha
- CMR College of Engineering & Technology,Department of CSE,Hyderabad,India
| | - Rama Reddy T
- Aditya University,Department of CSE,Surampalem,India
| | | | - Chiranjeevi Paritala
- Amrita Sai Institute of Science and Technology,Department of CSE,Bathinapadu,India
| | | | - V Rakesh
- B V Raju Institute of Technology Narsapur,Department of IT,Medak,Telangana,India
| |
Collapse
|
5
|
Li W, Gou F, Wu J. Artificial intelligence auxiliary diagnosis and treatment system for breast cancer in developing countries. JOURNAL OF X-RAY SCIENCE AND TECHNOLOGY 2024; 32:395-413. [PMID: 38189731 DOI: 10.3233/xst-230194] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/09/2024]
Abstract
BACKGROUND In many developing countries, a significant number of breast cancer patients are unable to receive timely treatment due to a large population base, high patient numbers, and limited medical resources. OBJECTIVE This paper proposes a breast cancer assisted diagnosis system based on electronic medical records. The goal of this system is to address the limitations of existing systems, which primarily rely on structured electronic records and may miss crucial information stored in unstructured records. METHODS The proposed approach is a breast cancer assisted diagnosis system based on electronic medical records. The system utilizes breast cancer enhanced convolutional neural networks with semantic initialization filters (BC-INIT-CNN). It extracts highly relevant tumor markers from unstructured medical records to aid in breast cancer staging diagnosis and effectively utilizes the important information present in unstructured records. RESULTS The model's performance is assessed using various evaluation metrics. Such as accuracy, ROC curves, and Precision-Recall curves. Comparative analysis demonstrates that the BC-INIT-CNN model outperforms several existing methods in terms of accuracy and computational efficiency. CONCLUSIONS The proposed breast cancer assisted diagnosis system based on BC-INIT-CNN showcases the potential to address the challenges faced by developing countries in providing timely treatment to breast cancer patients. By leveraging unstructured medical records and extracting relevant tumor markers, the system enables accurate staging diagnosis and enhances the utilization of valuable information.
Collapse
Affiliation(s)
- Wenxiu Li
- State Key Laboratory of Public Big Data, College of Computer Science and Technology, Guizhou University, Guiyang, China
| | - Fangfang Gou
- State Key Laboratory of Public Big Data, College of Computer Science and Technology, Guizhou University, Guiyang, China
| | - Jia Wu
- State Key Laboratory of Public Big Data, College of Computer Science and Technology, Guizhou University, Guiyang, China
- Research Center for Artificial Intelligence, Monash University, Melbourne, Clayton VIC, Australia
| |
Collapse
|
6
|
Maouche I, Terrissa LS, Benmohammed K, Zerhouni N. An Explainable AI Approach for Breast Cancer Metastasis Prediction Based on Clinicopathological Data. IEEE Trans Biomed Eng 2023; 70:3321-3329. [PMID: 37276094 DOI: 10.1109/tbme.2023.3282840] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
OBJECTIVE Breast Cancer is the most prevalent cancer and the first cause of cancer deaths among women worldwide. In 90% of the cases, mortality is related to distant metastasis. Computer-aided prognosis systems using machine learning models have been widely used to predict breast cancer metastasis. Despite that, these systems still face several challenges. First, the models are generally biased toward the majority class due to datasets unbalance. Second, their increased complexity is associated with decreased interpretability which causes clinicians to distrust their prognosis. METHODS To tackle these issues, we have proposed an explainable approach for predicting breast cancer metastasis using clinicopathological data. Our approach is based on cost-sensitive CatBoost classifier and utilises LIME explainer to provide patient-level explanations. RESULTS We used a public dataset of 716 breast cancer patients to assess our approach. The results demonstrate the superiority of cost-sensitive CatBoost in precision (76.5%), recall (79.5%), and f1-score (77%) over classical and boosting models. The LIME explainer was used to quantify the impact of patient and treatment characteristics on breast cancer metastasis, revealing that they have different impacts ranging from high impact like the non-use of adjuvant chemotherapy, and moderate impact including carcinoma with medullary features histological type, to low impact like oral contraception use. The code is available at https://github.com/IkramMaouche/CS-CatBoost Conclusion: Our approach serves as a first step toward introducing more efficient and explainable computer-aided prognosis systems for breast cancer metastasis prediction. SIGNIFICANCE This approach could help clinicians understand the factors behind metastasis and assist them in proposing more patient-specific therapeutic decisions.
Collapse
|
7
|
Shiner A, Kiss A, Saednia K, Jerzak KJ, Gandhi S, Lu FI, Emmenegger U, Fleshner L, Lagree A, Alera MA, Bielecki M, Law E, Law B, Kam D, Klein J, Pinard CJ, Shenfield A, Sadeghi-Naini A, Tran WT. Predicting Patterns of Distant Metastasis in Breast Cancer Patients following Local Regional Therapy Using Machine Learning. Genes (Basel) 2023; 14:1768. [PMID: 37761908 PMCID: PMC10531341 DOI: 10.3390/genes14091768] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2023] [Revised: 09/05/2023] [Accepted: 09/06/2023] [Indexed: 09/29/2023] Open
Abstract
Up to 30% of breast cancer (BC) patients will develop distant metastases (DM), for which there is no cure. Here, statistical and machine learning (ML) models were developed to estimate the risk of site-specific DM following local-regional therapy. This retrospective study cohort included 175 patients diagnosed with invasive BC who later developed DM. Clinicopathological information was collected for analysis. Outcome variables were the first site of metastasis (brain, bone or visceral) and the time interval (months) to developing DM. Multivariate statistical analysis and ML-based multivariable gradient boosting machines identified factors associated with these outcomes. Machine learning models predicted the site of DM, demonstrating an area under the curve of 0.74, 0.75, and 0.73 for brain, bone and visceral sites, respectively. Overall, most patients (57%) developed bone metastases, with increased odds associated with estrogen receptor (ER) positivity. Human epidermal growth factor receptor-2 (HER2) positivity and non-anthracycline chemotherapy regimens were associated with a decreased risk of bone DM, while brain metastasis was associated with ER-negativity. Furthermore, non-anthracycline chemotherapy alone was a significant predictor of visceral metastasis. Here, clinicopathologic and treatment variables used in ML prediction models predict the first site of metastasis in BC. Further validation may guide focused patient-specific surveillance practices.
Collapse
Affiliation(s)
- Audrey Shiner
- Department of Radiation Oncology, Sunnybrook Health Sciences Centre, Toronto, ON M4N 3M5, Canada; (A.S.)
- Biological Sciences Platform, Sunnybrook Research Institute, Toronto, ON M4N 3M5, Canada
- Institute of Medical Sciences, University of Toronto, Toronto, ON M5S 1A8, Canada
| | - Alex Kiss
- Institute of Clinical Evaluative Sciences, Sunnybrook Health Sciences Centre, Toronto, ON M4N 3M5, Canada
| | - Khadijeh Saednia
- Department of Radiation Oncology, Sunnybrook Health Sciences Centre, Toronto, ON M4N 3M5, Canada; (A.S.)
- Department of Electrical Engineering and Computer Science, Lassonde School of Engineering, York University, Toronto, ON M3J 1P3, Canada
| | - Katarzyna J. Jerzak
- Division of Medical Oncology, Department of Medicine, University of Toronto, Toronto, ON M5S 1A8, Canada
| | - Sonal Gandhi
- Division of Medical Oncology, Department of Medicine, University of Toronto, Toronto, ON M5S 1A8, Canada
| | - Fang-I Lu
- Department of Anatomic Pathology, Sunnybrook Health Sciences Centre, Toronto, ON M4N 3M5, Canada
| | - Urban Emmenegger
- Division of Medical Oncology, Department of Medicine, University of Toronto, Toronto, ON M5S 1A8, Canada
| | - Lauren Fleshner
- Department of Radiation Oncology, Sunnybrook Health Sciences Centre, Toronto, ON M4N 3M5, Canada; (A.S.)
- Biological Sciences Platform, Sunnybrook Research Institute, Toronto, ON M4N 3M5, Canada
- Institute of Medical Sciences, University of Toronto, Toronto, ON M5S 1A8, Canada
| | - Andrew Lagree
- Biological Sciences Platform, Sunnybrook Research Institute, Toronto, ON M4N 3M5, Canada
| | - Marie Angeli Alera
- Biological Sciences Platform, Sunnybrook Research Institute, Toronto, ON M4N 3M5, Canada
| | - Mateusz Bielecki
- Department of Radiation Oncology, Sunnybrook Health Sciences Centre, Toronto, ON M4N 3M5, Canada; (A.S.)
- Biological Sciences Platform, Sunnybrook Research Institute, Toronto, ON M4N 3M5, Canada
| | - Ethan Law
- Biological Sciences Platform, Sunnybrook Research Institute, Toronto, ON M4N 3M5, Canada
| | - Brianna Law
- Biological Sciences Platform, Sunnybrook Research Institute, Toronto, ON M4N 3M5, Canada
| | - Dylan Kam
- Biological Sciences Platform, Sunnybrook Research Institute, Toronto, ON M4N 3M5, Canada
| | - Jonathan Klein
- Department of Radiation Oncology, Albert Einstein College of Medicine, New York, NY 10461, USA
| | - Christopher J. Pinard
- Biological Sciences Platform, Sunnybrook Research Institute, Toronto, ON M4N 3M5, Canada
| | - Alex Shenfield
- Department of Engineering and Mathematics, Sheffield Hallam University, Sheffield S1 1WB, UK
| | - Ali Sadeghi-Naini
- Department of Radiation Oncology, Sunnybrook Health Sciences Centre, Toronto, ON M4N 3M5, Canada; (A.S.)
- Department of Electrical Engineering and Computer Science, Lassonde School of Engineering, York University, Toronto, ON M3J 1P3, Canada
| | - William T. Tran
- Department of Radiation Oncology, Sunnybrook Health Sciences Centre, Toronto, ON M4N 3M5, Canada; (A.S.)
- Biological Sciences Platform, Sunnybrook Research Institute, Toronto, ON M4N 3M5, Canada
- Institute of Medical Sciences, University of Toronto, Toronto, ON M5S 1A8, Canada
- Department of Radiation Oncology, University of Toronto, Toronto, ON M5S 1A8, Canada
| |
Collapse
|
8
|
Joy A, Lin M, Joines M, Saucedo A, Lee-Felker S, Baker J, Chien A, Emir U, Macey PM, Thomas MA. Ensemble Learning for Breast Cancer Lesion Classification: A Pilot Validation Using Correlated Spectroscopic Imaging and Diffusion-Weighted Imaging. Metabolites 2023; 13:835. [PMID: 37512542 PMCID: PMC10385820 DOI: 10.3390/metabo13070835] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2023] [Revised: 07/04/2023] [Accepted: 07/06/2023] [Indexed: 07/30/2023] Open
Abstract
The main objective of this work was to evaluate the application of individual and ensemble machine learning models to classify malignant and benign breast masses using features from two-dimensional (2D) correlated spectroscopy spectra extracted from five-dimensional echo-planar correlated spectroscopic imaging (5D EP-COSI) and diffusion-weighted imaging (DWI). Twenty-four different metabolite and lipid ratios with respect to diagonal fat peaks (1.4 ppm, 5.4 ppm) from 2D spectra, and water and fat peaks (4.7 ppm, 1.4 ppm) from one-dimensional non-water-suppressed (NWS) spectra were used as the features. Additionally, water fraction, fat fraction and water-to-fat ratios from NWS spectra and apparent diffusion coefficients (ADC) from DWI were included. The nine most important features were identified using recursive feature elimination, sequential forward selection and correlation analysis. XGBoost (AUC: 93.0%, Accuracy: 85.7%, F1-score: 88.9%, Precision: 88.2%, Sensitivity: 90.4%, Specificity: 84.6%) and GradientBoost (AUC: 94.3%, Accuracy: 89.3%, F1-score: 90.7%, Precision: 87.9%, Sensitivity: 94.2%, Specificity: 83.4%) were the best-performing models. Conventional biomarkers like choline, myo-Inositol, and glycine were statistically significant predictors. Key features contributing to the classification were ADC, 2D diagonal peaks at 0.9 ppm, 2.1 ppm, 3.5 ppm, and 5.4 ppm, cross peaks between 1.4 and 0.9 ppm, 4.3 and 4.1 ppm, 2.3 and 1.6 ppm, and the triglyceryl-fat cross peak. The results highlight the contribution of the 2D spectral peaks to the model, and they demonstrate the potential of 5D EP-COSI for early breast cancer detection.
Collapse
Affiliation(s)
- Ajin Joy
- Radiological Sciences, University of California Los Angeles, Los Angeles, CA 90095, USA; (A.J.); (M.L.); (M.J.); (A.S.); (S.L.-F.); (A.C.)
| | - Marlene Lin
- Radiological Sciences, University of California Los Angeles, Los Angeles, CA 90095, USA; (A.J.); (M.L.); (M.J.); (A.S.); (S.L.-F.); (A.C.)
| | - Melissa Joines
- Radiological Sciences, University of California Los Angeles, Los Angeles, CA 90095, USA; (A.J.); (M.L.); (M.J.); (A.S.); (S.L.-F.); (A.C.)
| | - Andres Saucedo
- Radiological Sciences, University of California Los Angeles, Los Angeles, CA 90095, USA; (A.J.); (M.L.); (M.J.); (A.S.); (S.L.-F.); (A.C.)
- Physics and Biology in Medicine-Inter-Departmental Graduate Program, University of California Los Angeles, Los Angeles, CA 90095, USA
| | - Stephanie Lee-Felker
- Radiological Sciences, University of California Los Angeles, Los Angeles, CA 90095, USA; (A.J.); (M.L.); (M.J.); (A.S.); (S.L.-F.); (A.C.)
| | - Jennifer Baker
- Surgery, University of California Los Angeles, Los Angeles, CA 90095, USA;
| | - Aichi Chien
- Radiological Sciences, University of California Los Angeles, Los Angeles, CA 90095, USA; (A.J.); (M.L.); (M.J.); (A.S.); (S.L.-F.); (A.C.)
| | - Uzay Emir
- School of Health Sciences, College of Health and Human Sciences, Purdue University, West Lafayette, IN 47907, USA;
| | - Paul M. Macey
- School of Nursing, University of California Los Angeles, Los Angeles, CA 90095, USA;
| | - M. Albert Thomas
- Radiological Sciences, University of California Los Angeles, Los Angeles, CA 90095, USA; (A.J.); (M.L.); (M.J.); (A.S.); (S.L.-F.); (A.C.)
- Physics and Biology in Medicine-Inter-Departmental Graduate Program, University of California Los Angeles, Los Angeles, CA 90095, USA
- BioEngineering, University of California Los Angeles, Los Angeles, CA 90095, USA
| |
Collapse
|
9
|
Fanizzi A, Pomarico D, Rizzo A, Bove S, Comes MC, Didonna V, Giotta F, La Forgia D, Latorre A, Pastena MI, Petruzzellis N, Rinaldi L, Tamborra P, Zito A, Lorusso V, Massafra R. Machine learning survival models trained on clinical data to identify high risk patients with hormone responsive HER2 negative breast cancer. Sci Rep 2023; 13:8575. [PMID: 37237020 PMCID: PMC10220052 DOI: 10.1038/s41598-023-35344-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2022] [Accepted: 05/16/2023] [Indexed: 05/28/2023] Open
Abstract
For endocrine-positive Her2 negative breast cancer patients at an early stage, the benefit of adding chemotherapy to adjuvant endocrine therapy is not still confirmed. Several genomic tests are available on the market but are very expensive. Therefore, there is the urgent need to explore novel reliable and less expensive prognostic tools in this setting. In this paper, we shown a machine learning survival model to estimate Invasive Disease-Free Events trained on clinical and histological data commonly collected in clinical practice. We collected clinical and cytohistological outcomes of 145 patients referred to Istituto Tumori "Giovanni Paolo II". Three machine learning survival models are compared with the Cox proportional hazards regression according to time-dependent performance metrics evaluated in cross-validation. The c-index at 10 years obtained by random survival forest, gradient boosting, and component-wise gradient boosting is stabled with or without feature selection at approximately 0.68 in average respect to 0.57 obtained to Cox model. Moreover, machine learning survival models have accurately discriminated low- and high-risk patients, and so a large group which can be spared additional chemotherapy to hormone therapy. The preliminary results obtained by including only clinical determinants are encouraging. The integrated use of data already collected in clinical practice for routine diagnostic investigations, if properly analyzed, can reduce time and costs of the genomic tests.
Collapse
Affiliation(s)
- Annarita Fanizzi
- Struttura Semplice Dipartimentale di Fisica Sanitaria, I.R.C.C.S. Istituto Tumori "Giovanni Paolo II", Viale Orazio Flacco 65, 70124, Bari, Italy
| | - Domenico Pomarico
- Struttura Semplice Dipartimentale di Fisica Sanitaria, I.R.C.C.S. Istituto Tumori "Giovanni Paolo II", Viale Orazio Flacco 65, 70124, Bari, Italy
| | - Alessandro Rizzo
- Struttura Semplice Dipartimentale di Oncologia Per la Presa in Carico Globale del Paziente Oncologico "Don Tonino Bello", I.R.C.C.S. Istituto Tumori "Giovanni Paolo II", Viale Orazio Flacco 65, 70124, Bari, Italy
| | - Samantha Bove
- Struttura Semplice Dipartimentale di Fisica Sanitaria, I.R.C.C.S. Istituto Tumori "Giovanni Paolo II", Viale Orazio Flacco 65, 70124, Bari, Italy.
| | - Maria Colomba Comes
- Struttura Semplice Dipartimentale di Fisica Sanitaria, I.R.C.C.S. Istituto Tumori "Giovanni Paolo II", Viale Orazio Flacco 65, 70124, Bari, Italy.
| | - Vittorio Didonna
- Struttura Semplice Dipartimentale di Fisica Sanitaria, I.R.C.C.S. Istituto Tumori "Giovanni Paolo II", Viale Orazio Flacco 65, 70124, Bari, Italy
| | - Francesco Giotta
- Unità Operativa Complessa di Oncologia Medica, I.R.C.C.S. Istituto Tumori "Giovanni Paolo II", Viale Orazio Flacco 65, 70124, Bari, Italy
| | - Daniele La Forgia
- Struttura Semplice Dipartimentale di Radiologia Senologica, I.R.C.C.S. Istituto Tumori "Giovanni Paolo II", Viale Orazio Flacco 65, 70124, Bari, Italy
| | - Agnese Latorre
- Unità Operativa Complessa di Oncologia Medica, I.R.C.C.S. Istituto Tumori "Giovanni Paolo II", Viale Orazio Flacco 65, 70124, Bari, Italy
| | - Maria Irene Pastena
- Unità Operativa Complessa di Anatomia Patologica, I.R.C.C.S. Istituto Tumori "Giovanni Paolo II", Viale Orazio Flacco 65, 70124, Bari, Italy
| | - Nicole Petruzzellis
- Struttura Semplice Dipartimentale di Fisica Sanitaria, I.R.C.C.S. Istituto Tumori "Giovanni Paolo II", Viale Orazio Flacco 65, 70124, Bari, Italy
| | - Lucia Rinaldi
- Struttura Semplice Dipartimentale di Oncologia Per la Presa in Carico Globale del Paziente Oncologico "Don Tonino Bello", I.R.C.C.S. Istituto Tumori "Giovanni Paolo II", Viale Orazio Flacco 65, 70124, Bari, Italy
| | - Pasquale Tamborra
- Struttura Semplice Dipartimentale di Fisica Sanitaria, I.R.C.C.S. Istituto Tumori "Giovanni Paolo II", Viale Orazio Flacco 65, 70124, Bari, Italy
| | - Alfredo Zito
- Unità Operativa Complessa di Anatomia Patologica, I.R.C.C.S. Istituto Tumori "Giovanni Paolo II", Viale Orazio Flacco 65, 70124, Bari, Italy
| | - Vito Lorusso
- Unità Operativa Complessa di Oncologia Medica, I.R.C.C.S. Istituto Tumori "Giovanni Paolo II", Viale Orazio Flacco 65, 70124, Bari, Italy
| | - Raffaella Massafra
- Struttura Semplice Dipartimentale di Fisica Sanitaria, I.R.C.C.S. Istituto Tumori "Giovanni Paolo II", Viale Orazio Flacco 65, 70124, Bari, Italy
| |
Collapse
|
10
|
Massafra R, Fanizzi A, Amoroso N, Bove S, Comes MC, Pomarico D, Didonna V, Diotaiuti S, Galati L, Giotta F, La Forgia D, Latorre A, Lombardi A, Nardone A, Pastena MI, Ressa CM, Rinaldi L, Tamborra P, Zito A, Paradiso AV, Bellotti R, Lorusso V. Analyzing breast cancer invasive disease event classification through explainable artificial intelligence. Front Med (Lausanne) 2023; 10:1116354. [PMID: 36817766 PMCID: PMC9932275 DOI: 10.3389/fmed.2023.1116354] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2022] [Accepted: 01/13/2023] [Indexed: 02/05/2023] Open
Abstract
Introduction Recently, accurate machine learning and deep learning approaches have been dedicated to the investigation of breast cancer invasive disease events (IDEs), such as recurrence, contralateral and second cancers. However, such approaches are poorly interpretable. Methods Thus, we designed an Explainable Artificial Intelligence (XAI) framework to investigate IDEs within a cohort of 486 breast cancer patients enrolled at IRCCS Istituto Tumori "Giovanni Paolo II" in Bari, Italy. Using Shapley values, we determined the IDE driving features according to two periods, often adopted in clinical practice, of 5 and 10 years from the first tumor diagnosis. Results Age, tumor diameter, surgery type, and multiplicity are predominant within the 5-year frame, while therapy-related features, including hormone, chemotherapy schemes and lymphovascular invasion, dominate the 10-year IDE prediction. Estrogen Receptor (ER), proliferation marker Ki67 and metastatic lymph nodes affect both frames. Discussion Thus, our framework aims at shortening the distance between AI and clinical practice.
Collapse
Affiliation(s)
| | | | - Nicola Amoroso
- INFN, Sezione di Bari, Bari, Italy
- Dipartimento di Farmacia-Scienze del Farmaco, Università degli Studi di Bari Aldo Moro, Bari, Italy
| | - Samantha Bove
- IRCCS Istituto Tumori “Giovanni Paolo II”, Bari, Italy
| | | | - Domenico Pomarico
- INFN, Sezione di Bari, Bari, Italy
- Dipartimento di Fisica, Università degli Studi di Bari Aldo Moro, Bari, Italy
| | | | | | - Luisa Galati
- International Agency for Research on Cancer, Lyon, France
| | | | | | | | - Angela Lombardi
- Dipartimento di Ingegneria Elettrica e dell'Informazione, Politecnico di Bari, Bari, Italy
| | | | | | | | - Lucia Rinaldi
- IRCCS Istituto Tumori “Giovanni Paolo II”, Bari, Italy
| | | | - Alfredo Zito
- IRCCS Istituto Tumori “Giovanni Paolo II”, Bari, Italy
| | | | - Roberto Bellotti
- INFN, Sezione di Bari, Bari, Italy
- Dipartimento di Fisica, Università degli Studi di Bari Aldo Moro, Bari, Italy
| | - Vito Lorusso
- IRCCS Istituto Tumori “Giovanni Paolo II”, Bari, Italy
| |
Collapse
|
11
|
Joy A, Saucedo A, Joines M, Lee-Felker S, Kumar S, Sarma MK, Sayre J, DiNome M, Thomas MA. Correlated MR spectroscopic imaging of breast cancer to investigate metabolites and lipids: acceleration and compressed sensing reconstruction. BJR Open 2022; 4:20220009. [PMID: 36860693 PMCID: PMC9969076 DOI: 10.1259/bjro.20220009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2022] [Revised: 08/23/2022] [Accepted: 08/25/2022] [Indexed: 11/05/2022] Open
Abstract
Objectives The main objective of this work was to detect novel biomarkers in breast cancer by spreading the MR spectra over two dimensions in multiple spatial locations using an accelerated 5D EP-COSI technology. Methods The 5D EP-COSI data were non-uniformly undersampled with an acceleration factor of 8 and reconstructed using group sparsity-based compressed sensing reconstruction. Different metabolite and lipid ratios were then quantified and statistically analyzed for significance. Linear discriminant models based on the quantified metabolite and lipid ratios were generated. Spectroscopic images of the quantified metabolite and lipid ratios were also reconstructed. Results The 2D COSY spectra generated using the 5D EP-COSI technique showed differences among healthy, benign, and malignant tissues in terms of their mean values of metabolite and lipid ratios, especially the ratios of potential novel biomarkers based on unsaturated fatty acids, myo-inositol, and glycine. It is further shown the potential of choline and unsaturated lipid ratio maps, generated from the quantified COSY signals across multiple locations in the breast, to serve as complementary markers of malignancy that can be added to the multiparametric MR protocol. Discriminant models using metabolite and lipid ratios were found to be statistically significant for classifying benign and malignant tumor from healthy tissues. Conclusions Accelerated 5D EP-COSI technique demonstrates the potential to detect novel biomarkers such as glycine, myo-inositol, and unsaturated fatty acids in addition to commonly reported choline in breast cancer, and facilitates metabolite and lipid ratio maps which have the potential to play a significant role in breast cancer detection. Advances in knowledge This study presents the first evaluation of a multidimensional MR spectroscopic imaging technique for the detection of potentially novel biomarkers based on glycine, myo-inositol, and unsaturated fatty acids, in addition to commonly reported choline. Spatial mapping of choline and unsaturated fatty acid ratios with respect to water in malignant and benign breast masses are also shown. These metabolic characteristics may serve as additional biomarkers for improving the diagnostic and therapeutic evaluation of breast cancer.
Collapse
Affiliation(s)
- Ajin Joy
- Radiological Sciences, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA, United States
| | | | - Melissa Joines
- Radiological Sciences, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA, United States
| | - Stephanie Lee-Felker
- Radiological Sciences, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA, United States
| | - Sumit Kumar
- Radiological Sciences, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA, United States
| | - Manoj K Sarma
- Radiological Sciences, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA, United States
| | - James Sayre
- Radiological Sciences, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA, United States
| | - Maggie DiNome
- Surgery, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA, United States
| | | |
Collapse
|
12
|
Wu M, Zhao Y, Dong X, Jin Y, Cheng S, Zhang N, Xu S, Gu S, Wu Y, Yang J, Yao L, Wang Y. Artificial intelligence-based preoperative prediction system for diagnosis and prognosis in epithelial ovarian cancer: A multicenter study. Front Oncol 2022; 12:975703. [PMID: 36212430 PMCID: PMC9532858 DOI: 10.3389/fonc.2022.975703] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2022] [Accepted: 08/11/2022] [Indexed: 11/13/2022] Open
Abstract
Background Ovarian cancer (OC) is the most lethal gynecological malignancy, with limited early screening methods and poor prognosis. Artificial intelligence technology has made a great breakthrough in cancer diagnosis. Purpose We aim to develop a specific interpretable machine learning (ML) prediction model for the diagnosis and prognosis of epithelial ovarian cancer (EOC) based on a variety of biomarkers. Methods A total of 521 patients with EOC and 144 patients with benign gynecological diseases were enrolled including derivation datasets and an external validation cohort. The predicted information was acquired by 9 supervised ML methods, through 34 parameters. Behind predicted reasons for the best ML were improved by using the SHapley Additive exPlanations (SHAP) algorithm. In addition, the prognosis of EOC was analyzed by unsupervised clustering and Kaplan–Meier (KM) survival analysis. Results ML technology was superior to conventional logistic regression in predicting EOC diagnosis and XGBoost performed best in the external validation datasets. The AUC values of distinguishing EOC and benign disease patients, determining pathological type, grade and clinical stage were 0.958 (0.926-0.989), 0.792 (0.701-0.8834), 0.819 (0.687-0.950) and 0.68 (0.573-0.788) respectively. For negative CA-125 EOC patients, the AUC performance of XGBoost model was 0.835(0.763-0.907). We used unsupervised cluster analysis to identify EOC subgroups with significantly poor overall survival (p-value <0.0001) and recurrence-free survival (p-value <0.0001). Conclusions Based on the preoperative characteristics, we proved that ML algorithm can provide an acceptable diagnosis and prognosis prediction model for EOC patients. Meanwhile, SHAP analysis can improve the interpretability of ML models and contribute to precision medicine.
Collapse
Affiliation(s)
- Meixuan Wu
- Department of Obstetrics and Gynecology, Shanghai First Maternity and Infant Hospital, School of Medicine, Tongji University, Shanghai, China
- Department of Obstetrics and Gynecology, Renji Hospital, School of Medicine, Shanghai Jiaotong University, Shanghai, China
| | - Yaqian Zhao
- Department of Obstetrics and Gynecology, Renji Hospital, School of Medicine, Shanghai Jiaotong University, Shanghai, China
| | - Xuhui Dong
- Obstetrics and Gynecology Hospital, Fudan University, Shanghai, China
| | - Yue Jin
- Department of Obstetrics and Gynecology, Renji Hospital, School of Medicine, Shanghai Jiaotong University, Shanghai, China
| | - Shanshan Cheng
- Department of Obstetrics and Gynecology, Renji Hospital, School of Medicine, Shanghai Jiaotong University, Shanghai, China
| | - Nan Zhang
- Department of Obstetrics and Gynecology, Renji Hospital, School of Medicine, Shanghai Jiaotong University, Shanghai, China
| | - Shilin Xu
- Department of Obstetrics and Gynecology, Renji Hospital, School of Medicine, Shanghai Jiaotong University, Shanghai, China
| | - Sijia Gu
- Department of Obstetrics and Gynecology, Renji Hospital, School of Medicine, Shanghai Jiaotong University, Shanghai, China
| | - Yongsong Wu
- Department of Obstetrics and Gynecology, Renji Hospital, School of Medicine, Shanghai Jiaotong University, Shanghai, China
| | - Jiani Yang
- Department of Obstetrics and Gynecology, Shanghai First Maternity and Infant Hospital, School of Medicine, Tongji University, Shanghai, China
- *Correspondence: Yu Wang, ; Liangqing Yao, ; Jiani Yang,
| | - Liangqing Yao
- Obstetrics and Gynecology Hospital, Fudan University, Shanghai, China
- *Correspondence: Yu Wang, ; Liangqing Yao, ; Jiani Yang,
| | - Yu Wang
- Department of Obstetrics and Gynecology, Shanghai First Maternity and Infant Hospital, School of Medicine, Tongji University, Shanghai, China
- *Correspondence: Yu Wang, ; Liangqing Yao, ; Jiani Yang,
| |
Collapse
|
13
|
Massafra R, Comes MC, Bove S, Didonna V, Diotaiuti S, Giotta F, Latorre A, La Forgia D, Nardone A, Pomarico D, Ressa CM, Rizzo A, Tamborra P, Zito A, Lorusso V, Fanizzi A. A machine learning ensemble approach for 5- and 10-year breast cancer invasive disease event classification. PLoS One 2022; 17:e0274691. [PMID: 36121822 PMCID: PMC9484691 DOI: 10.1371/journal.pone.0274691] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2022] [Accepted: 09/02/2022] [Indexed: 12/24/2022] Open
Abstract
Designing targeted treatments for breast cancer patients after primary tumor removal is necessary to prevent the occurrence of invasive disease events (IDEs), such as recurrence, metastasis, contralateral and second tumors, over time. However, due to the molecular heterogeneity of this disease, predicting the outcome and efficacy of the adjuvant therapy is challenging. A novel ensemble machine learning classification approach was developed to address the task of producing prognostic predictions of the occurrence of breast cancer IDEs at both 5- and 10-years. The method is based on the concept of voting among multiple models to give a final prediction for each individual patient. Promising results were achieved on a cohort of 529 patients, whose data, related to primary breast cancer, were provided by Istituto Tumori "Giovanni Paolo II" in Bari, Italy. Our proposal greatly improves the performances returned by the baseline original model, i.e., without voting, finally reaching a median AUC value of 77.1% and 76.3% for the IDE prediction at 5-and 10-years, respectively. Finally, the proposed approach allows to promote more intelligible decisions and then a greater acceptability in clinical practice since it returns an explanation of the IDE prediction for each individual patient through the voting procedure.
Collapse
Affiliation(s)
| | | | - Samantha Bove
- I.R.C.C.S. Istituto Tumori “Giovanni Paolo II”, Bari, Italy
| | | | | | | | - Agnese Latorre
- I.R.C.C.S. Istituto Tumori “Giovanni Paolo II”, Bari, Italy
| | | | | | - Domenico Pomarico
- Dipartimento di Fisica and MECENAS, Università di Bari, Bari, Italy
- INFN, Sezione di Bari, Bari, Italy
| | | | | | | | - Alfredo Zito
- I.R.C.C.S. Istituto Tumori “Giovanni Paolo II”, Bari, Italy
| | - Vito Lorusso
- I.R.C.C.S. Istituto Tumori “Giovanni Paolo II”, Bari, Italy
| | | |
Collapse
|
14
|
Wang J. Prediction of postoperative recovery in patients with acoustic neuroma using machine learning and SMOTE-ENN techniques. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2022; 19:10407-10423. [PMID: 36032000 DOI: 10.3934/mbe.2022487] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Acoustic neuroma is a common benign tumor that is frequently associated with postoperative complications such as facial nerve dysfunction, which greatly affects the physical and mental health of patients. In this paper, clinical data of patients with acoustic neuroma treated with microsurgery by the same operator at Xiangya Hospital of Central South University from June 2018 to March 2020 are used as the study object. Machine learning and SMOTE-ENN techniques are used to accurately predict postoperative facial nerve function recovery, thus filling a gap in auxiliary diagnosis within the field of facial nerve treatment in acoustic neuroma. First, raw clinical data are processed and dependent variables are identified based on clinical context and data characteristics. Secondly, data balancing is corrected using the SMOTE-ENN technique. Finally, XGBoost is selected to construct a prediction model for patients' postoperative recovery, and is also compared with a total of four machine learning models, LR, SVM, CART, and RF. We find that XGBoost can most accurately predict the postoperative facial nerve function recovery, with a prediction accuracy of 90.0% and an AUC value of 0.90. CART, RF, and XGBoost can further select the more important preoperative indicators and provide therapeutic assistance to physicians, thereby improving the patient's postoperative recovery. The results show that machine learning and SMOTE-ENN techniques can handle complex clinical data and achieve accurate predictions.
Collapse
Affiliation(s)
- Jianing Wang
- School of Mathematics and Statistics, Central South University, Changsha 410083, China
| |
Collapse
|
15
|
Bhende M, Thakare A, Pant B, Singhal P, Shinde S, Saravanan V. Deep Learning-Based Real-Time Discriminate Correlation Analysis for Breast Cancer Detection. BIOMED RESEARCH INTERNATIONAL 2022; 2022:4609625. [PMID: 35800216 PMCID: PMC9256435 DOI: 10.1155/2022/4609625] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/05/2022] [Revised: 05/28/2022] [Accepted: 06/11/2022] [Indexed: 12/04/2022]
Abstract
Breast cancer is the most common cancer in women, and the breast mass recognition model can effectively assist doctors in clinical diagnosis. However, the scarcity of medical image samples makes the recognition model prone to overfitting. A breast mass recognition model integrated with deep pathological information mining is proposed: constructing a sample selection strategy, screening high-quality samples across different mammography image datasets, and dealing with the scarcity of medical image samples from the perspective of data enhancement; mining the pathology contained in limited labeled models from shallow to deep information; and dealing with the shortage of medical image samples from the perspective of feature optimization. The multiview effective region gene optimization (MvERGS) algorithm is designed to refine the original image features, improve the feature discriminate and compress the feature dimension, better match the number of samples, and perform discriminate correlation analysis (DCA) on the advanced new features; in-depth cross-modal correlation between heterogeneous elements, that is, the deep pathological information, can be mined to describe the breast mass lesion area accurately. Based on deep pathological information and traditional classifiers, an efficient breast mass recognition model is trained to complete the classification of mammography images. Experiments show that the key technical indicators of the recognition model, including accuracy and AUC, are better than the mainstream baselines, and the overfitting problem caused by the scarcity of samples is alleviated.
Collapse
Affiliation(s)
- Manisha Bhende
- Marathwada Mitra Mandal's Institute of Technology, Pune, India
| | | | - Bhasker Pant
- Department of Computer Science & Engineering, Graphic Era Deemed to Be University, Dehradun, Uttarakhand 248002, India
| | - Piyush Singhal
- Department of Mechanical Engineering, GLA University, Mathura 281406, India
| | - Swati Shinde
- Department of Computer Engineering, Pimpri Chinchwad College of Engineering, Pune, India
| | - V. Saravanan
- Department of Computer Science, College of Engineering and Technology, Dambi Dollo University, Dambi Dollo, Oromia Region, Ethiopia
| |
Collapse
|
16
|
Fuzzy Neural Network Expert System with an Improved Gini Index Random Forest-Based Feature Importance Measure Algorithm for Early Diagnosis of Breast Cancer in Saudi Arabia. BIG DATA AND COGNITIVE COMPUTING 2022. [DOI: 10.3390/bdcc6010013] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/02/2023]
Abstract
Breast cancer is one of the common malignancies among females in Saudi Arabia and has also been ranked as the one most prevalent and the number two killer disease in the country. However, the clinical diagnosis process of any disease such as breast cancer, coronary artery diseases, diabetes, COVID-19, among others, is often associated with uncertainty due to the complexity and fuzziness of the process. In this work, a fuzzy neural network expert system with an improved gini index random forest-based feature importance measure algorithm for early diagnosis of breast cancer in Saudi Arabia was proposed to address the uncertainty and ambiguity associated with the diagnosis of breast cancer and also the heavier burden on the overlay of the network nodes of the fuzzy neural network system that often happens due to insignificant features that are used to predict or diagnose the disease. An Improved Gini Index Random Forest-Based Feature Importance Measure Algorithm was used to select the five fittest features of the diagnostic wisconsin breast cancer database out of the 32 features of the dataset. The logistic regression, support vector machine, k-nearest neighbor, random forest, and gaussian naïve bayes learning algorithms were used to develop two sets of classification models. Hence, the classification models with full features (32) and models with the 5 fittest features. The two sets of classification models were evaluated, and the results of the evaluation were compared. The result of the comparison shows that the models with the selected fittest features outperformed their counterparts with full features in terms of accuracy, sensitivity, and sensitivity. Therefore, a fuzzy neural network based expert system was developed with the five selected fittest features and the system achieved 99.33% accuracy, 99.41% sensitivity, and 99.24% specificity. Moreover, based on the comparison of the system developed in this work against the previous works that used fuzzy neural network or other applied artificial intelligence techniques on the same dataset for diagnosis of breast cancer using the same dataset, the system stands to be the best in terms of accuracy, sensitivity, and specificity, respectively. The z test was also conducted, and the test result shows that there is significant accuracy achieved by the system for early diagnosis of breast cancer.
Collapse
|
17
|
Yu G, Chen Z, Wu J, Tan Y. A diagnostic prediction framework on auxiliary medical system for breast cancer in developing countries. Knowl Based Syst 2021. [DOI: 10.1016/j.knosys.2021.107459] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
18
|
Wang D, Chen Z, Zhao H. Prototype transfer generative adversarial network for unsupervised breast cancer histology image classification. Biomed Signal Process Control 2021. [DOI: 10.1016/j.bspc.2021.102713] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
|
19
|
Li J, Zhou Z, Dong J, Fu Y, Li Y, Luan Z, Peng X. Predicting breast cancer 5-year survival using machine learning: A systematic review. PLoS One 2021; 16:e0250370. [PMID: 33861809 PMCID: PMC8051758 DOI: 10.1371/journal.pone.0250370] [Citation(s) in RCA: 42] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2021] [Accepted: 04/06/2021] [Indexed: 12/24/2022] Open
Abstract
BACKGROUND Accurately predicting the survival rate of breast cancer patients is a major issue for cancer researchers. Machine learning (ML) has attracted much attention with the hope that it could provide accurate results, but its modeling methods and prediction performance remain controversial. The aim of this systematic review is to identify and critically appraise current studies regarding the application of ML in predicting the 5-year survival rate of breast cancer. METHODS In accordance with the PRISMA guidelines, two researchers independently searched the PubMed (including MEDLINE), Embase, and Web of Science Core databases from inception to November 30, 2020. The search terms included breast neoplasms, survival, machine learning, and specific algorithm names. The included studies related to the use of ML to build a breast cancer survival prediction model and model performance that can be measured with the value of said verification results. The excluded studies in which the modeling process were not explained clearly and had incomplete information. The extracted information included literature information, database information, data preparation and modeling process information, model construction and performance evaluation information, and candidate predictor information. RESULTS Thirty-one studies that met the inclusion criteria were included, most of which were published after 2013. The most frequently used ML methods were decision trees (19 studies, 61.3%), artificial neural networks (18 studies, 58.1%), support vector machines (16 studies, 51.6%), and ensemble learning (10 studies, 32.3%). The median sample size was 37256 (range 200 to 659820) patients, and the median predictor was 16 (range 3 to 625). The accuracy of 29 studies ranged from 0.510 to 0.971. The sensitivity of 25 studies ranged from 0.037 to 1. The specificity of 24 studies ranged from 0.008 to 0.993. The AUC of 20 studies ranged from 0.500 to 0.972. The precision of 6 studies ranged from 0.549 to 1. All of the models were internally validated, and only one was externally validated. CONCLUSIONS Overall, compared with traditional statistical methods, the performance of ML models does not necessarily show any improvement, and this area of research still faces limitations related to a lack of data preprocessing steps, the excessive differences of sample feature selection, and issues related to validation. Further optimization of the performance of the proposed model is also needed in the future, which requires more standardization and subsequent validation.
Collapse
Affiliation(s)
- Jiaxin Li
- School of Nursing, Jilin University, Jilin, China
| | - Zijun Zhou
- Breast Surgery, Jilin Province Tumor Hospital, Jilin, China
| | - Jianyu Dong
- School of Nursing, Jilin University, Jilin, China
| | - Ying Fu
- School of Nursing, Jilin University, Jilin, China
| | - Yuan Li
- School of Nursing, Jilin University, Jilin, China
| | - Ze Luan
- School of Nursing, Jilin University, Jilin, China
| | - Xin Peng
- School of Nursing, Jilin University, Jilin, China
- * E-mail:
| |
Collapse
|
20
|
Machine-Learning Provides Patient-Specific Prediction of Metastatic Risk Based on Innovative, Mechanobiology Assay. Ann Biomed Eng 2021; 49:1774-1783. [PMID: 33483841 DOI: 10.1007/s10439-020-02720-9] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2020] [Accepted: 12/30/2020] [Indexed: 12/13/2022]
Abstract
Cancer mortality is mostly related to metastasis. Metastasis is currently prognosed via histopathology, disease-statistics, or genetics; those are potentially inaccurate, not rapidly available and require known markers. We had developed a rapid (~ 2 h) mechanobiology-based approach to provide early prognosis of the clinical likelihood for metastasis. Specifically, invasive cell-subsets seeded on impenetrable, physiological-stiffness polyacrylamide gels forcefully indent the gels, while non-invasive/benign cells do not. The number of indenting cells and their attained depths, the mechanical invasiveness, accurately define the metastatic risk of tumors and cell-lines. Utilizing our experimental database, we compare the capacity of several machine learning models to predict the metastatic risk. Models underwent supervised training on individual experiments using classification from literature and commercial-sources for established cell-lines and clinical histopathology reports for tumor samples. We evaluated 2-class models, separating invasive/non-invasive (e.g. benign) samples, and obtained sensitivity and specificity of 0.92 and 1, respectively; this surpasses other works. We also introduce a novel approach, using 5-class models (i.e. normal, benign, cancer-metastatic-non/low/high) that provided average sensitivity and specificity of 0.69 and 0.91. Combining our rapid, mechanical invasiveness assay with machine learning classification can provide accurate and early prognosis of metastatic risk, to support choice of treatments and disease management.
Collapse
|
21
|
Functional and Structural Connectome Features for Machine Learning Chemo-Brain Prediction in Women Treated for Breast Cancer with Chemotherapy. Brain Sci 2020; 10:brainsci10110851. [PMID: 33198294 PMCID: PMC7696512 DOI: 10.3390/brainsci10110851] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2020] [Revised: 11/07/2020] [Accepted: 11/11/2020] [Indexed: 12/12/2022] Open
Abstract
Breast cancer is the leading cancer among women worldwide, and a high number of breast cancer patients are struggling with psychological and cognitive disorders. In this study, we aim to use machine learning models to discriminate between chemo-brain participants and healthy controls (HCs) using connectomes (connectivity matrices) and topological coefficients. Nineteen female post-chemotherapy breast cancer (BC) survivors and 20 female HCs were recruited for this study. Participants in both groups received resting-state functional magnetic resonance imaging (rs-fMRI) and generalized q-sampling imaging (GQI). Logistic regression (LR), decision tree classifier (CART), and xgboost (XGB) were the models we adopted for classification. In connectome analysis, LR achieved an accuracy of 79.49% with the functional connectomes and an accuracy of 71.05% with the structural connectomes. In the topological coefficient analysis, accuracies of 87.18%, 82.05%, and 83.78% were obtained by the functional global efficiency with CART, the functional global efficiency with XGB, and the structural transitivity with CART, respectively. The areas under the curves (AUCs) were 0.93, 0.94, 0.87, 0.88, and 0.84, respectively. Our study showed the discriminating ability of functional connectomes, structural connectomes, and global efficiency. We hope our findings can contribute to an understanding of the chemo brain and the establishment of a clinical system for tracking chemo brain.
Collapse
|
22
|
Zhong X, Luo T, Deng L, Liu P, Hu K, Lu D, Zheng D, Luo C, Xie Y, Li J, He P, Pu T, Ye F, Bu H, Fu B, Zheng H. Multidimensional Machine Learning Personalized Prognostic Model in an Early Invasive Breast Cancer Population-Based Cohort in China: Algorithm Validation Study. JMIR Med Inform 2020; 8:e19069. [PMID: 33164899 PMCID: PMC7683252 DOI: 10.2196/19069] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2020] [Revised: 08/07/2020] [Accepted: 09/16/2020] [Indexed: 02/05/2023] Open
Abstract
Background Current online prognostic prediction models for breast cancer, such as Adjuvant! Online and PREDICT, are based on specific populations. They have been well validated and widely used in the United States and Western Europe; however, several validation attempts in non-European countries have revealed suboptimal predictions. Objective We aimed to develop an advanced breast cancer prognosis model for disease progression, cancer-specific mortality, and all-cause mortality by integrating tumor, demographic, and treatment characteristics from a large breast cancer cohort in China. Methods This study was approved by the Clinical Test and Biomedical Ethics Committee of West China Hospital, Sichuan University on May 17, 2012. Data collection for this project was started in May 2017 and ended in March 2019. Data on 5293 women diagnosed with stage I to III invasive breast cancer between 2000 and 2013 were collected. Disease progression, cancer-specific mortality, all-cause mortality, and the likelihood of disease progression or death within a 5-year period were predicted. Extreme gradient boosting was used to develop the prediction model. Model performance was assessed by calculating the area under the receiver operating characteristic curve (AUROC), and the model was calibrated and compared with PREDICT. Results The training, test, and validation sets comprised 3276 (499 progressions, 202 breast cancer-specific deaths, and 261 all-cause deaths within 5-year follow-up), 1405 (211 progressions, 94 breast cancer-specific deaths, and 129 all-cause deaths), and 612 (109 progressions, 33 breast cancer-specific deaths, and 37 all-cause deaths) women, respectively. The AUROC values for disease progression, cancer-specific mortality, and all-cause mortality were 0.76, 0.88, and 0.82 for training set; 0.79, 0.80, and 0.83 for the test set; and 0.79, 0.84, and 0.88 for the validation set, respectively. Calibration analysis demonstrated good agreement between predicted and observed events within 5 years. Comparable AUROC and calibration results were confirmed in different age, residence status, and receptor status subgroups. Compared with PREDICT, our model showed similar AUROC and improved calibration values. Conclusions Our prognostic model exhibits high discrimination and good calibration. It may facilitate prognosis prediction and clinical decision making for patients with breast cancer in China.
Collapse
Affiliation(s)
- Xiaorong Zhong
- Department of Head, Neck and Mammary Gland Oncology, Cancer Center, West China Hospital, Sichuan University, Chengdu, China
| | - Ting Luo
- Department of Head, Neck and Mammary Gland Oncology, Cancer Center, West China Hospital, Sichuan University, Chengdu, China
| | - Ling Deng
- Laboratory of Molecular Diagnosis of Cancer, Clinical Research Center for Breast, West China Hospital, Sichuan University, Chengdu, China
| | - Pei Liu
- Big Data Research Center, School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu, China
| | - Kejia Hu
- Department of Medical Epidemiology & Biostatistics, Karolinska Institutet, Stockholm, Sweden
| | - Donghao Lu
- Department of Medical Epidemiology & Biostatistics, Karolinska Institutet, Stockholm, Sweden
| | - Dan Zheng
- Laboratory of Molecular Diagnosis of Cancer, Clinical Research Center for Breast, West China Hospital, Sichuan University, Chengdu, China
| | - Chuanxu Luo
- Laboratory of Molecular Diagnosis of Cancer, Clinical Research Center for Breast, West China Hospital, Sichuan University, Chengdu, China
| | - Yuxin Xie
- Department of Head, Neck and Mammary Gland Oncology, Cancer Center, West China Hospital, Sichuan University, Chengdu, China
| | - Jiayuan Li
- Department of Epidemiology and Biostatistics, West China School of Public Health, Sichuan University, Chengdu, China
| | - Ping He
- Department of Head, Neck and Mammary Gland Oncology, Cancer Center, West China Hospital, Sichuan University, Chengdu, China
| | - Tianjie Pu
- Laboratory of Pathology, West China Hospital, Sichuan University, Chengdu, China
| | - Feng Ye
- Laboratory of Pathology, West China Hospital, Sichuan University, Chengdu, China
| | - Hong Bu
- Laboratory of Pathology, West China Hospital, Sichuan University, Chengdu, China
| | - Bo Fu
- Big Data Research Center, School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu, China
| | - Hong Zheng
- Laboratory of Molecular Diagnosis of Cancer, Clinical Research Center for Breast, West China Hospital, Sichuan University, Chengdu, China
| |
Collapse
|
23
|
Autoencoded DNA methylation data to predict breast cancer recurrence: Machine learning models and gene-weight significance. Artif Intell Med 2020; 110:101976. [PMID: 33250148 DOI: 10.1016/j.artmed.2020.101976] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2019] [Revised: 08/05/2020] [Accepted: 10/18/2020] [Indexed: 12/29/2022]
Abstract
Breast cancer is the most frequent cancer in women and the second most frequent overall after lung cancer. Although the 5-year survival rate of breast cancer is relatively high, recurrence is also common which often involves metastasis with its consequent threat for patients. DNA methylation-derived databases have become an interesting primary source for supervised knowledge extraction regarding breast cancer. Unfortunately, the study of DNA methylation involves the processing of hundreds of thousands of features for every patient. DNA methylation is featured by High Dimension Low Sample Size which has shown well-known issues regarding feature selection and generation. Autoencoders (AEs) appear as a specific technique for conducting nonlinear feature fusion. Our main objective in this work is to design a procedure to summarize DNA methylation by taking advantage of AEs. Our proposal is able to generate new features from the values of CpG sites of patients with and without recurrence. Then, a limited set of relevant genes to characterize breast cancer recurrence is proposed by the application of survival analysis and a pondered ranking of genes according to the distribution of their CpG sites. To test our proposal we have selected a dataset from The Cancer Genome Atlas data portal and an AE with a single-hidden layer. The literature and enrichment analysis (based on genomic context and functional annotation) conducted regarding the genes obtained with our experiment confirmed that all of these genes were related to breast cancer recurrence.
Collapse
|
24
|
Fan Y, Li Y, Li Y, Feng S, Bao X, Feng M, Wang R. Development and assessment of machine learning algorithms for predicting remission after transsphenoidal surgery among patients with acromegaly. Endocrine 2020; 67:412-422. [PMID: 31673954 DOI: 10.1007/s12020-019-02121-6] [Citation(s) in RCA: 27] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/04/2019] [Accepted: 10/21/2019] [Indexed: 12/11/2022]
Abstract
PURPOSE Preoperative prediction of transsphenoidal surgical (TSS) response is important for determining individual treatment strategies for acromegaly. There is currently no accurate predictive model for TSS response for acromegaly. The current study sought to develop and validate machine learning (ML)-based models for preoperative prediction of TSS response for acromegaly. METHODS Six hundred sixty-eight patients with acromegaly were enrolled and divided into training (n = 534) and text datasets (n = 134) in this retrospective, data mining and ML study. The forward search algorithm was used to select features, and six ML algorithms were applied to construct TSS response prediction models. The performance of these ML models was validated using receiver operating characteristics analysis. Model calibration, discrimination ability, and clinical usefulness were also assessed. RESULTS Three hundred forty-nine (52.2%) patients achieved postoperative remission criteria and exhibited good TSS response. A univariate analysis was conducted and eight features, including age, hypertension, ophthalmic disorders, GH, IGF-1, nadir GH, maximal tumor diameter, and Knosp grade, were significantly associated with the TSS response in patients with acromegaly. After feature selection, the gradient boosting decision tree (GBDT), which was constructed with the eight significant features showed the best favorable discriminatory ability both the training (AUC = 0.8555) and validation (AUC = 0.8178) cohorts. The GBDT model showed good discrimination ability and calibration, with the highest levels of accuracy and specificity, and provided better estimates of TTS responses of patients with acromegaly compared with using only the Knosp grade. Decision curve analysis confirmed that the model was clinically useful. CONCLUSIONS ML-based models could aid neurosurgeons in the preoperative prediction of TTS response for patients with acromegaly, and could contribute to determining individual treatment strategies.
Collapse
Affiliation(s)
- Yanghua Fan
- Department of Neurosurgery, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, No. 1 Shuai Fu Yuan, Dongcheng District, 100730, Beijing, China
| | | | - Yichao Li
- DHC Software Co. Ltd, Beijing, China
| | - Shanshan Feng
- Department of Neurosurgery, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, No. 1 Shuai Fu Yuan, Dongcheng District, 100730, Beijing, China
| | - Xinjie Bao
- Department of Neurosurgery, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, No. 1 Shuai Fu Yuan, Dongcheng District, 100730, Beijing, China
| | - Ming Feng
- Department of Neurosurgery, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, No. 1 Shuai Fu Yuan, Dongcheng District, 100730, Beijing, China.
| | - Renzhi Wang
- Department of Neurosurgery, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, No. 1 Shuai Fu Yuan, Dongcheng District, 100730, Beijing, China.
| |
Collapse
|
25
|
Dai C, Fan Y, Li Y, Bao X, Li Y, Su M, Yao Y, Deng K, Xing B, Feng F, Feng M, Wang R. Development and Interpretation of Multiple Machine Learning Models for Predicting Postoperative Delayed Remission of Acromegaly Patients During Long-Term Follow-Up. Front Endocrinol (Lausanne) 2020; 11:643. [PMID: 33042013 PMCID: PMC7525125 DOI: 10.3389/fendo.2020.00643] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/03/2020] [Accepted: 08/07/2020] [Indexed: 12/11/2022] Open
Abstract
Background: Some patients with acromegaly do not reach the remission standard in the short term after surgery but achieve remission without additional postoperative treatment during long-term follow-up; this phenomenon is defined as postoperative delayed remission (DR). DR may complicate the interpretation of surgical outcomes in patients with acromegaly and interfere with decision-making regarding postoperative adjuvant therapy. Objective: We aimed to develop and validate machine learning (ML) models for predicting DR in acromegaly patients who have not achieved remission within 6 months of surgery. Methods: We enrolled 306 acromegaly patients and randomly divided them into training and test datasets. We used the recursive feature elimination (RFE) algorithm to select features and applied six ML algorithms to construct DR prediction models. The performance of these ML models was validated using receiver operating characteristics analysis. We used permutation importance, SHapley Additive exPlanations (SHAP), and local interpretable model-agnostic explanation (LIME) algorithms to determine the importance of the selected features and interpret the ML models. Results: Fifty-five (17.97%) acromegaly patients met the criteria for DR, and five features (post-1w rGH, post-1w nGH, post-6m rGH, post-6m IGF-1, and post-6m nGH) were significantly associated with DR in both the training and the test datasets. After the RFE feature selection, the XGboost model, which comprised the 15 important features, had the greatest discriminatory ability (area under the curve = 0.8349, sensitivity = 0.8889, Youden's index = 0.6842). The XGboost model showed good discrimination ability and provided significantly better estimates of DR of patients with acromegaly compared with using only the Knosp grade. The results obtained from permutation importance, SHAP, and LIME algorithms showed that post-6m IGF-1 is the most important feature in XGboost algorithm prediction and showed the reliability and the clinical practicability of the XGboost model in DR prediction. Conclusions: ML-based models can serve as an effective non-invasive approach to predicting DR and could aid in determining individual treatment and follow-up strategies for acromegaly patients who have not achieved remission within 6 months of surgery.
Collapse
Affiliation(s)
- Congxin Dai
- Department of Neurosurgery, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Yanghua Fan
- Department of Neurosurgery, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Yichao Li
- DHC Mediway Technology Co., Ltd., Beijing, China
| | - Xinjie Bao
- Department of Neurosurgery, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Yansheng Li
- DHC Mediway Technology Co., Ltd., Beijing, China
| | - Mingliang Su
- DHC Mediway Technology Co., Ltd., Beijing, China
| | - Yong Yao
- Department of Neurosurgery, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Kan Deng
- Department of Neurosurgery, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Bing Xing
- Department of Neurosurgery, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Feng Feng
- Department of Radiology, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Ming Feng
- Department of Neurosurgery, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
- *Correspondence: Ming Feng
| | - Renzhi Wang
- Department of Neurosurgery, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
- Renzhi Wang
| |
Collapse
|