1
|
Ramezani Z, Charati JY, Alizadeh-Navaei R, Eslamijouybari M. Accelerated hazard prediction based on age time-scale for women diagnosed with breast cancer using a deep learning method. BMC Med Inform Decis Mak 2024; 24:314. [PMID: 39468511 PMCID: PMC11514944 DOI: 10.1186/s12911-024-02725-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2023] [Accepted: 10/16/2024] [Indexed: 10/30/2024] Open
Abstract
Breast cancer is the most common cancer in women. Previous studies have investigated estimating and predicting the proportional hazard rates and survival in breast cancer. This study deals with predicting accelerated hazards (AH) rate based on age categories in breast cancer patients using deep learning methods. The AH has a time-dependent structure whose rate changes according to time and variable effects. We have collected data related to 1225 female patients with breast cancer at the Mandarin University of Medical Sciences. The patients' demographic and clinical characteristics including family history, age, history of tobacco use, hysterectomy, first menstruation age, gravida, number of breastfeeding, disease grade, marital status, and survival status have been recorded. Initially, we dealt with predicting three age groups of patients: ≤ 40, 41-60, and ≥ 61 years. Then, the prediction of accelerated risk value based on age categories for each breast cancer patient through deep learning and the importance of variables using LightGBM is discussed. Improving clinical management and treatment of breast cancer requires advanced methods such as time-dependent AH calculation. When the behavioral effect is assumed as a time scale change between hazard functions, the AH model is more appropriate for randomized clinical trials. The study results demonstrate the proper performance of the proposed model for predicting AH by age categories based on breast cancer patients' demographic and clinical characteristics.
Collapse
Affiliation(s)
- Zahra Ramezani
- Department of Epidemiology and Biostatistics, School of Health, Health Sciences Research Center, Addiction Institute, Mazandaran University of Medical Sciences, Sari, Iran
| | - Jamshid Yazdani Charati
- Department of Epidemiology and Biostatistics, School of Health, Health Sciences Research Center, Addiction Institute, Mazandaran University of Medical Sciences, Sari, Iran.
| | - Reza Alizadeh-Navaei
- Gastrointestinal Cancer Research Center, Non-Communicable Diseases Research Institute, Mazandaran University of Medical Sciences, Sari, Iran
| | - Mohammad Eslamijouybari
- Department of Hematology and Oncology, Gastrointestinal Cancer Research Center, Mazandaran University of Medical Sciences, Sari, Iran
| |
Collapse
|
2
|
Cai K, Wang Z, Yang X, Fu W, Zhao X. Predicting Clinical Outcomes in COVID-19 and Pneumonia Patients: A Machine Learning Approach. Viruses 2024; 16:1624. [PMID: 39459956 PMCID: PMC11512216 DOI: 10.3390/v16101624] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2024] [Revised: 10/05/2024] [Accepted: 10/15/2024] [Indexed: 10/28/2024] Open
Abstract
In the clinical diagnosis of pneumonia, particularly during the COVID-19 pandemic, individuals who progress to a critical stage requiring mechanical ventilation are classified as mechanically ventilated critically ill patients. Accurately predicting the discharge outcomes for this specific cohort, especially those with COVID-19, is of paramount clinical importance. Missing data, a common issue in medical research, can significantly impact the validity of analyses. In this work, we address this challenge by employing two missing data imputation techniques: multiple imputation and missForest, to enhance data completeness. Additionally, we utilize the smoothly clipped absolute deviation (SCAD) penalized logistic regression method to select significant features. Our real data analysis compares the predictive performances of extreme learning machines, random forests, support vector machines, and XGBoost using 10-fold cross-validation. The results consistently show that XGBoost outperforms the other methods in predicting discharge outcomes, making it a reliable tool for clinical decision-making in the treatment of severe pneumonia, including COVID-19 cases. Within this context, the random forest imputation method generally enhances performance, underscoring its effectiveness in managing missing data compared to multiple imputation.
Collapse
Affiliation(s)
- Kaida Cai
- Department of Epidemiology and Biostatistics, School of Public Health, Southeast University, Nanjing 210009, China
- Department of Statistics and Actuarial Science, School of Mathematics, Southeast University, Nanjing 211189, China; (Z.W.); (X.Y.); (W.F.); (X.Z.)
- Key Laboratory of Environmental Medicine Engineering, Ministry of Education, School of Public Health, Southeast University, Nanjing 210009, China
| | - Zhengyan Wang
- Department of Statistics and Actuarial Science, School of Mathematics, Southeast University, Nanjing 211189, China; (Z.W.); (X.Y.); (W.F.); (X.Z.)
| | - Xiaofang Yang
- Department of Statistics and Actuarial Science, School of Mathematics, Southeast University, Nanjing 211189, China; (Z.W.); (X.Y.); (W.F.); (X.Z.)
| | - Wenzhi Fu
- Department of Statistics and Actuarial Science, School of Mathematics, Southeast University, Nanjing 211189, China; (Z.W.); (X.Y.); (W.F.); (X.Z.)
| | - Xin Zhao
- Department of Statistics and Actuarial Science, School of Mathematics, Southeast University, Nanjing 211189, China; (Z.W.); (X.Y.); (W.F.); (X.Z.)
- Key Laboratory of Measurement and Control of Complex Systems of Engineering, Ministry of Education, Southeast University, Nanjing 210096, China
| |
Collapse
|
3
|
Guo L, Xie Y, He J, Li X, Zhou W, Chen Q. Breast cancer prediction model based on clinical and biochemical characteristics: clinical data from patients with benign and malignant breast tumors from a single center in South China. J Cancer Res Clin Oncol 2023; 149:13257-13269. [PMID: 37480526 DOI: 10.1007/s00432-023-05181-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2023] [Accepted: 07/11/2023] [Indexed: 07/24/2023]
Abstract
OBJECTIVE Breast cancer is the most prevalent cancer and is second leading cause of death from malignancy among women worldwide. In addition to tumor factors, the host characteristics of tumors have been paid more and more attention by the medical community. This study aimed to develop a breast cancer prediction model for the Chinese population using clinical and biochemical characteristics. METHODS This is a retrospective study. From 2012 to 2021, we selected 19,751 patients with breast diseases from the Guangdong Hospital of Traditional Chinese Medicine, which included 5660 patients with breast cancer and 14,091 patients with benign breast diseases-75% of patients were randomly assigned to the training group and 25% to the test group using a total of 34 clinical and biochemical characteristics. Significant clinical signs were investigated, and logistic regression with recursive feature elimination (RFE) model was used to develop a prediction model for distinguishing benign from malignant breast diseases. The prediction model's accuracy, precision, sensitivity, specificity, and area under the ROC curve (AUC) were calculated. RESULTS Clinical statistics demonstrated that the prediction model comprised 19 clinical characteristics had statistical separability in both the training group and the test group, as well as good sensitivity and prediction. CONCLUSIONS This model based on biochemical parameters demonstrates a significant predictive effect for breast cancer and may be useful as a reference for invasive tissue biopsy in patients undergoing BI-RADS 3 and 4A breast imaging.
Collapse
Affiliation(s)
- Li Guo
- Department of Breast, The Second Affiliated Hospital of Guangzhou University of Chinese Medicine, No. 111 of Dade Road, Yuexiu District, Guangzhou, 510120, China
| | - Yanyan Xie
- School of Medical Information Engineering, Guangzhou University of Chinese Medicine, No. 232 Wide Ring East Road, Panyu District, Guangzhou, 510006, China
| | - Junhao He
- School of Medical Information Engineering, Guangzhou University of Chinese Medicine, No. 232 Wide Ring East Road, Panyu District, Guangzhou, 510006, China
| | - Xian Li
- School of Medical Information Engineering, Guangzhou University of Chinese Medicine, No. 232 Wide Ring East Road, Panyu District, Guangzhou, 510006, China
| | - Wu Zhou
- School of Medical Information Engineering, Guangzhou University of Chinese Medicine, No. 232 Wide Ring East Road, Panyu District, Guangzhou, 510006, China.
| | - Qianjun Chen
- Department of Breast, The Second Affiliated Hospital of Guangzhou University of Chinese Medicine, No. 111 of Dade Road, Yuexiu District, Guangzhou, 510120, China.
| |
Collapse
|
4
|
A Proposed Framework for Early Prediction of Schistosomiasis. Diagnostics (Basel) 2022; 12:diagnostics12123138. [PMID: 36553145 PMCID: PMC9777618 DOI: 10.3390/diagnostics12123138] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2022] [Revised: 12/08/2022] [Accepted: 12/08/2022] [Indexed: 12/15/2022] Open
Abstract
Schistosomiasis is a neglected tropical disease that continues to be a leading cause of illness and mortality around the globe. The causing parasites are affixed to the skin through defiled water and enter the human body. Failure to diagnose Schistosomiasis can result in various medical complications, such as ascites, portal hypertension, esophageal varices, splenomegaly, and growth retardation. Early prediction and identification of risk factors may aid in treating disease before it becomes incurable. We aimed to create a framework by incorporating the most significant features to predict Schistosomiasis using machine learning techniques. A dataset of advanced Schistosomiasis has been employed containing recovery and death cases. A total data of 4316 individuals containing recovery and death cases were included in this research. The dataset contains demographics, socioeconomic, and clinical factors with lab reports. Data preprocessing techniques (missing values imputation, outlier removal, data normalisation, and data transformation) have also been employed for better results. Feature selection techniques, including correlation-based feature selection, Information gain, gain ratio, ReliefF, and OneR, have been utilised to minimise a large number of features. Data resampling algorithms, including Random undersampling, Random oversampling, Cluster Centroid, Near miss, and SMOTE, are applied to address the data imbalance problem. We applied four machine learning algorithms to construct the model: Gradient Boosting, Light Gradient Boosting, Extreme Gradient Boosting and CatBoost. The performance of the proposed framework has been evaluated based on Accuracy, Precision, Recall and F1-Score. The results of our proposed framework stated that the CatBoost model showed the best performance with the highest accuracy of (87.1%) compared with Gradient Boosting (86%), Light Gradient Boosting (86.7%) and Extreme Gradient Boosting (86.9%). Our proposed framework will assist doctors and healthcare professionals in the early diagnosis of Schistosomiasis.
Collapse
|
5
|
Rabiei R, Ayyoubzadeh SM, Sohrabei S, Esmaeili M, Atashi A. Prediction of Breast Cancer using Machine Learning Approaches. J Biomed Phys Eng 2022; 12:297-308. [PMID: 35698545 PMCID: PMC9175124 DOI: 10.31661/jbpe.v0i0.2109-1403] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2021] [Accepted: 03/05/2022] [Indexed: 05/27/2023]
Abstract
BACKGROUND Breast cancer is considered one of the most common cancers in women caused by various clinical, lifestyle, social, and economic factors. Machine learning has the potential to predict breast cancer based on features hidden in data. OBJECTIVE This study aimed to predict breast cancer using different machine-learning approaches applying demographic, laboratory, and mammographic data. MATERIAL AND METHODS In this analytical study, the database, including 5,178 independent records, 25% of which belonged to breast cancer patients with 24 attributes in each record was obtained from Motamed cancer institute (ACECR), Tehran, Iran. The database contained 5,178 independent records, 25% of which belonged to breast cancer patients containing 24 attributes in each record. The random forest (RF), neural network (MLP), gradient boosting trees (GBT), and genetic algorithms (GA) were used in this study. Models were initially trained with demographic and laboratory features (20 features). The models were then trained with all demographic, laboratory, and mammographic features (24 features) to measure the effectiveness of mammography features in predicting breast cancer. RESULTS RF presented higher performance compared to other techniques (accuracy 80%, sensitivity 95%, specificity 80%, and the area under the curve (AUC) 0.56). Gradient boosting (AUC=0.59) showed a stronger performance compared to the neural network. CONCLUSION Combining multiple risk factors in modeling for breast cancer prediction could help the early diagnosis of the disease with necessary care plans. Collection, storage, and management of different data and intelligent systems based on multiple factors for predicting breast cancer are effective in disease management.
Collapse
Affiliation(s)
- Reza Rabiei
- PhD, Department of Health Information Technology and Management, School of Allied Medical Sciences, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Seyed Mohammad Ayyoubzadeh
- PhD, Department of Health Information Technology and Management, School of Allied Medical Sciences, Tehran University of Medical Science, Tehran, Iran
| | - Solmaz Sohrabei
- MSc, Department Deputy of Development, Management and Resources, Office of Statistic and Information Technology Management, Zanjan University of Medical Sciences, Zanjan, Iran
| | - Marzieh Esmaeili
- PhD, Department of Health Information Technology and Management, School of Allied Medical Sciences, Tehran University of Medical Science, Tehran, Iran
| | - Alireza Atashi
- PhD, Department of E-Health, Virtual School, Tehran University of Medical Sciences, Medical Informatics Research Group, Clinical Research Department, Breast Cancer Research Center, Motamed Cancer Institute, ACECR, Tehran, Iran
| |
Collapse
|
6
|
Nanglia S, Ahmad M, Ali Khan F, Jhanjhi N. An enhanced Predictive heterogeneous ensemble model for breast cancer prediction. Biomed Signal Process Control 2022. [DOI: 10.1016/j.bspc.2021.103279] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
|
7
|
Jaskani FH, Shaikh H, Khuhawar F, Amin MT, Azam MA, Aqeel M. Comparative Analysis of Urdu Parts Of Speech Taggers using Machine Learning Techniques. 2020 IEEE 23RD INTERNATIONAL MULTITOPIC CONFERENCE (INMIC) 2020. [DOI: 10.1109/inmic50486.2020.9318205] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/01/2023]
|
8
|
Latif MZ, Shaukat K, Luo S, Hameed IA, Iqbal F, Alam TM. Risk Factors Identification of Malignant Mesothelioma: A Data Mining Based Approach. 2020 INTERNATIONAL CONFERENCE ON ELECTRICAL, COMMUNICATION, AND COMPUTER ENGINEERING (ICECCE) 2020. [DOI: 10.1109/icecce49384.2020.9179443] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/01/2023]
|