1
|
Rafiepoor H, Ghorbankhanloo A, Zendehdel K, Madar ZZ, Hajivalizadeh S, Hasani Z, Sarmadi A, Amanpour‐Gharaei B, Barati MA, Saadat M, Sadegh‐Zadeh S, Amanpour S. Comparison of Machine Learning Models for Classification of Breast Cancer Risk Based on Clinical Data. Cancer Rep (Hoboken) 2025; 8:e70175. [PMID: 40176498 PMCID: PMC11965882 DOI: 10.1002/cnr2.70175] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2024] [Revised: 02/01/2025] [Accepted: 02/20/2025] [Indexed: 04/04/2025] Open
Abstract
BACKGROUND Breast cancer (BC) is a major global health concern with rising incidence and mortality rates in many developing countries. Effective BC risk assessment models are crucial for prevention and early detection. While the Gail model, a traditional logistic regression-based model, has been broadly used, its predictive performance may be limited by its linear assumptions. With the rapid advancement of artificial intelligence (AI) in medical sciences, various complex machine learning algorithms have been developed for risk prediction, including for BC. AIMS This study aims to compare the quality of AI-based models with the traditional Gail model in assessing BC risk using a population dataset. It also evaluates the performance of these models in predicting BC risk. METHODS AND RESULTS This study involved 942 newly diagnosed BC patients and 975 healthy controls at the Cancer Institute in IKH hospital Complex, Tehran. Ten classification algorithms were applied to the dataset. The accuracy, sensitivity, precision, and feature importance in the machine learning algorithms were assessed and compared to previous studies for evaluation. The study found that AI algorithms alone did not significantly improve predictability compared to the Gail model. However, the importance of variables varied significantly among the AI algorithms. Understanding feature importance and interactions is crucial in AI modeling in order to enhance accuracy and identify critical risk factors. CONCLUSION This study concluded that, in BC risk prediction, incorporating specific risk factors, such as genetic and image-related variables, may be necessary to further enhance accuracy in BC risk prediction models. Furthermore, it is crucial to address modeling issues in models with a restricted number of features for future research.
Collapse
Affiliation(s)
- Haniyeh Rafiepoor
- Cancer Biology Research CenterCancer Institute, Tehran University of Medical SciencesTehranIran
| | - Alireza Ghorbankhanloo
- Cancer Biology Research CenterCancer Institute, Tehran University of Medical SciencesTehranIran
| | - Kazem Zendehdel
- Cancer Biology Research CenterCancer Institute, Tehran University of Medical SciencesTehranIran
| | - Zahra Zangeneh Madar
- School of Industrial Engineering, Iran University of Science and TechnologyTehranIran
- Department of Industrial EngineeringIran University of Science and TechnologyTehranIran
| | - Sepideh Hajivalizadeh
- Osteoporosis Research Center, Endocrinology and Metabolism Research InstituteTehran University of Medical SciencesTehranIran
| | - Zeinab Hasani
- School of Medicine, Tehran University of Medical ScienceTehranIran
| | - Ali Sarmadi
- Faculty of Mechanical Engineering, K. N. Toosi University of TechnologyTehranIran
| | - Behzad Amanpour‐Gharaei
- Cancer Biology Research CenterCancer Institute, Tehran University of Medical SciencesTehranIran
| | | | - Mozafar Saadat
- Department of Mechanical EngineeringSchool of Engineering, University of BirminghamBirminghamUK
| | - Seyed‐Ali Sadegh‐Zadeh
- Department of ComputingSchool of Digital, Technologies and Arts, Staffordshire UniversityStoke‐on‐TrentUK
| | - Saeid Amanpour
- Cancer Biology Research CenterCancer Institute, Tehran University of Medical SciencesTehranIran
| |
Collapse
|
2
|
Baron C, Mehanna P, Daneault C, Hausermann L, Busseuil D, Tardif JC, Dupuis J, Des Rosiers C, Ruiz M, Hussin JG. Insights into heart failure metabolite markers through explainable machine learning. Comput Struct Biotechnol J 2025; 27:1012-1022. [PMID: 40160858 PMCID: PMC11953987 DOI: 10.1016/j.csbj.2025.02.041] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2024] [Revised: 02/25/2025] [Accepted: 02/27/2025] [Indexed: 04/02/2025] Open
Abstract
Understanding molecular traits through metabolomics offers an avenue to tailor cardiovascular prevention, diagnosis and treatment strategies more effectively. This study focuses on the application of machine learning (ML) and explainable artificial intelligence (XAI) algorithms to detect discriminant molecular signatures in heart failure (HF). We aim to uncover metabolites with significant predictive value by analyzing targeted metabolomics data through ML and XAI algorithms. After quality control, we analyzed 55 metabolites from 124 plasma samples, including 53 HF patients and 71 controls, comparing Ridge Logistic Regression, Support Vector Machine and eXtreme Gradient Boosting models. All achieved high accuracy in predicting group labels: 84.0% [95% CI: 75.3 - 92.7], 85.73 [95% CI: 78.6 - 92.9], and 84.8% [95% CI: 76.1 - 93.5], respectively. Permutation-based variable importance and Local Interpretable Model-agnostic Explanations (LIME) were used for group-level and individual-level explainability, respectively, complemented by H-Friedman statistics for variable interactions, yielding reliable, explainable insights of the ML models. Metabolites well-known for their association with HF, such as glucose and cholesterol, and more recently described, the C18:1 carnitine, were reaffirmed in our analysis. The novel discovery of lignoceric acid (C24:0 fatty acid) as a critical discriminator, was confirmed in a replication cohort, underscoring its potential as a metabolite marker. Furthermore, our study highlights the utility of 2-way variable interaction analysis in unveiling a network of metabolite interactions essential for accurate disease prediction. The results demonstrate our approach's efficacy in identifying key metabolites and their interactions, illustrating the power of ML and XAI in advancing personalized healthcare solutions.
Collapse
Affiliation(s)
- Cantin Baron
- Département de Biochimie et de Médecine Moléculaire, Université de Montréal, Montréal, Quebec, Canada
- Montreal Heart Institute, Research Center, Montréal, Quebec, Canada
- Mila - Quebec AI Institute, Université de Montréal, Montréal, Quebec, Canada
| | - Pamela Mehanna
- Montreal Heart Institute, Research Center, Montréal, Quebec, Canada
| | | | | | - David Busseuil
- Montreal Heart Institute, Research Center, Montréal, Quebec, Canada
| | - Jean-Claude Tardif
- Montreal Heart Institute, Research Center, Montréal, Quebec, Canada
- Département de médecine, Université de Montréal, Montréal, Quebec, Canada
| | - Jocelyn Dupuis
- Montreal Heart Institute, Research Center, Montréal, Quebec, Canada
- Département de médecine, Université de Montréal, Montréal, Quebec, Canada
| | - Christine Des Rosiers
- Montreal Heart Institute, Research Center, Montréal, Quebec, Canada
- Département de Nutrition, Université de Montréal, Montréal, Quebec, Canada
| | - Matthieu Ruiz
- Montreal Heart Institute, Research Center, Montréal, Quebec, Canada
- Département de Nutrition, Université de Montréal, Montréal, Quebec, Canada
| | - Julie G. Hussin
- Département de Biochimie et de Médecine Moléculaire, Université de Montréal, Montréal, Quebec, Canada
- Montreal Heart Institute, Research Center, Montréal, Quebec, Canada
- Mila - Quebec AI Institute, Université de Montréal, Montréal, Quebec, Canada
- Département de médecine, Université de Montréal, Montréal, Quebec, Canada
| |
Collapse
|
3
|
Shin H, Oh S. An effective heuristic for developing hybrid feature selection in high dimensional and low sample size datasets. BMC Bioinformatics 2024; 25:390. [PMID: 39722052 DOI: 10.1186/s12859-024-06017-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2024] [Accepted: 12/17/2024] [Indexed: 12/28/2024] Open
Abstract
BACKGROUND High-dimensional datasets with low sample sizes (HDLSS) are pivotal in the fields of biology and bioinformatics. One of core objective of HDLSS is to select most informative features and discarding redundant or irrelevant features. This is particularly crucial in bioinformatics, where accurate feature (gene) selection can lead to breakthroughs in drug development and provide insights into disease diagnostics. Despite its importance, identifying optimal features is still a significant challenge in HDLSS. RESULTS To address this challenge, we propose an effective feature selection method that combines gradual permutation filtering with a heuristic tribrid search strategy, specifically tailored for HDLSS contexts. The proposed method considers inter-feature interactions and leverages feature rankings during the search process. In addition, a new performance metric for the HDLSS that evaluates both the number and quality of selected features is suggested. Through the comparison of the benchmark dataset with existing methods, the proposed method reduced the average number of selected features from 37.8 to 5.5 and improved the performance of the prediction model, based on the selected features, from 0.855 to 0.927. CONCLUSIONS The proposed method effectively selects a small number of important features and achieves high prediction performance.
Collapse
Affiliation(s)
- Hyunseok Shin
- Department of Computer Science, Dankook University, Youngin, Gyeonggi, South Korea
| | - Sejong Oh
- Department of Software Science, Dankook University, Youngin, Gyeonggi, South Korea.
| |
Collapse
|
4
|
Jin J, Wu Y, Cao P, Zheng X, Zhang Q, Chen Y. Potential and challenge in accelerating high-value conversion of CO 2 in microbial electrosynthesis system via data-driven approach. BIORESOURCE TECHNOLOGY 2024; 412:131380. [PMID: 39214179 DOI: 10.1016/j.biortech.2024.131380] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/17/2024] [Revised: 08/26/2024] [Accepted: 08/27/2024] [Indexed: 09/04/2024]
Abstract
Microbial electrosynthesis for CO2 utilization (MESCU) producing valuable chemicals with high energy density has garnered attention due to its long-term stability and high coulombic efficiency. The data-driven approaches offer a promising avenue by leveraging existing data to uncover the underlying patterns. This comprehensive review firstly uncovered the potentials of utilizing data-driven approaches to enhance high-value conversion of CO2 via MESCU. Firstly, critical challenges of MESCU advancing have been identified, including reactor configuration, cathode design, and microbial analysis. Subsequently, the potential of data-driven approaches to tackle the corresponding challenges, encompassing the identification of pivotal parameters governing reactor setup and cathode design, alongside the decipheration of omics data derived from microbial communities, have been discussed. Correspondingly, the future direction of data-driven approaches in assisting the application of MESCU has been addressed. This review offers guidance and theoretical support for future data-driven applications to accelerate MESCU research and potential industrialization.
Collapse
Affiliation(s)
- Jiasheng Jin
- State Key Laboratory of Pollution Control and Resource Reuse, School of Environmental Science and Engineering, Tongji University, Shanghai 200092, China
| | - Yang Wu
- State Key Laboratory of Pollution Control and Resource Reuse, School of Environmental Science and Engineering, Tongji University, Shanghai 200092, China.
| | - Peiyu Cao
- State Key Laboratory of Pollution Control and Resource Reuse, School of Environmental Science and Engineering, Tongji University, Shanghai 200092, China
| | - Xiong Zheng
- State Key Laboratory of Pollution Control and Resource Reuse, School of Environmental Science and Engineering, Tongji University, Shanghai 200092, China; Key Laboratory of Yangtze River Water Environment, School of Environmental Science and Engineering, Tongji University, Shanghai 200092, China; Shanghai Institute of Pollution Control and Ecological Security, Shanghai 200092, China.
| | - Qingran Zhang
- State Key Laboratory of Pollution Control and Resource Reuse, School of Environmental Science and Engineering, Tongji University, Shanghai 200092, China
| | - Yinguang Chen
- State Key Laboratory of Pollution Control and Resource Reuse, School of Environmental Science and Engineering, Tongji University, Shanghai 200092, China; Shanghai Institute of Pollution Control and Ecological Security, Shanghai 200092, China
| |
Collapse
|
5
|
Zhang M, Kuo TT. Early prediction of long hospital stay for Intensive Care units readmission patients using medication information. Comput Biol Med 2024; 174:108451. [PMID: 38603899 PMCID: PMC11385457 DOI: 10.1016/j.compbiomed.2024.108451] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2023] [Revised: 03/21/2024] [Accepted: 04/07/2024] [Indexed: 04/13/2024]
Abstract
OBJECTIVE Predicting Intensive Care Unit (ICU) Length of Stay (LOS) accurately can improve patient wellness, hospital operations, and the health system's financial status. This study focuses on predicting the prolonged ICU LOS (≥3 days) of the 2nd admission, utilizing short historical data (1st admission only) for early-stage prediction, as well as incorporating medication information. MATERIALS AND METHODS We selected 18,572 ICU patients' records from the MIMIC-IV database for this study. We applied five machine learning classifiers: Logistic regression (LR), Random Forest (RF), Support Vector Machine (SVM), AdaBoost (AB) and XGBoost (XGB). We computed both the sum dose and the average dose for the medication and included them in our model. RESULTS The performance of the RF model demonstrates the highest level of accuracy compared to other models, as indicated by an Area Under the Receiver Operating Characteristic Curve (AUC) of 0.716 and an Expected Calibration Error (ECE) of 0.023. DISCUSSION The calibration improved all five classifiers (LR, RF, SVC, AB, XGB) in terms of ECE. The most important two features for RF are the length of 1st admission and the patient's age when they visited the hospital. The most important medication features are Phytonadione and Metoprolol Succinate XL. Also, both the sum and the average dose for the medication features contributed to the prediction task. CONCLUSION Our model showed the capability to predict the prolonged ICU LOS of the 2nd admission by utilizing the demographic, diagnosis, and medication information from the 1st admission. This method can potentially support the prevention of patient complications and enhance resource allocation in hospitals.
Collapse
Affiliation(s)
- Min Zhang
- Applied Statistics, University of Michigan, Ann Arbor, MI, 48109, USA
| | - Tsung-Ting Kuo
- UCSD Health Department of Biomedical Informatics, University of California San Diego, La Jolla, CA, 92093, USA.
| |
Collapse
|
6
|
Silva VP, Carvalho RDA, Rêgo JHDS, Evangelista F. Machine Learning-Based Prediction of the Compressive Strength of Brazilian Concretes: A Dual-Dataset Study. MATERIALS (BASEL, SWITZERLAND) 2023; 16:4977. [PMID: 37512252 PMCID: PMC10381529 DOI: 10.3390/ma16144977] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/22/2023] [Revised: 06/10/2023] [Accepted: 06/11/2023] [Indexed: 07/30/2023]
Abstract
Lately, several machine learning (ML) techniques are emerging as alternative and efficient ways to predict how component properties influence the properties of the final mixture. In the area of civil engineering, recent research already uses ML techniques with conventional concrete dosages. The importance of discussing its use in the Brazilian context is inserted in an international context in which this methodology is already being applied, and it is necessary to verify the applicability of these techniques with national databases or what is created from national input data. In this research, one of these techniques, an artificial neural network (ANN), is used to determine the compressive strength of conventional Brazilian concrete at 7 and 28 days by using a database built through publications in congresses and academic works and comparing it with the reference database of Yeh. The data were organized into nine variables in which the data samples for training and test sets vary in five different cases. The eight possible input variables were: consumption of cement, blast furnace slag, pozzolana, water, additive, fine aggregate, coarse aggregate, and age. The response variable was the compressive strength of the concrete. Using international data as a training set and Brazilian data as a test set, or vice versa, did not show satisfactory results in isolation. The results showed a variation in the five scenarios; however, when using the Brazilian and the reference data sets together as test and training sets, higher R2 values were obtained, showing that in the union of the two databases, a good predictive model is obtained.
Collapse
Affiliation(s)
- Vitor Pereira Silva
- Department of Civil and Environmental Engineering, SG-12, University of Brasília (UnB), Brasilia 70910-900, Brazil
| | - Ruan de Alencar Carvalho
- Department of Civil and Environmental Engineering, SG-12, University of Brasília (UnB), Brasilia 70910-900, Brazil
| | - João Henrique da Silva Rêgo
- Department of Civil and Environmental Engineering, SG-12, University of Brasília (UnB), Brasilia 70910-900, Brazil
| | - Francisco Evangelista
- Department of Civil and Environmental Engineering, SG-12, University of Brasília (UnB), Brasilia 70910-900, Brazil
| |
Collapse
|
7
|
Dikeç G, Özer D. Protocol Registration and Reporting of Systematic Review and Meta-Analyses Published in Psychiatric and Mental Health Nursing Journals: A Descriptive Study. Issues Ment Health Nurs 2023:1-8. [PMID: 37279384 DOI: 10.1080/01612840.2023.2212768] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Although it is not mandatory for systematic review and meta-analysis studies, protocol registration is essential in the prevention of biases. This study aims to investigate the protocol registration status and reporting of systematic reviews and meta-analyses published in psychiatric nursing journals. This descriptive study's data were obtained by scanning the 10 mental health and psychiatric nursing journals in which the studies of psychiatric nurses were most frequently published and by examining the systematic reviews and meta-analyses published between 2012-2022. A total of 177 completed studies have been reviewed. It was determined that 18.6% of the examined systematic reviews and meta-analyses had a protocol registration. Almost all (96.9%) of the registered studies were registered with PROSPERO, and 72.7% were registered prospectively. It was determined that the registration status of the studies changed statistically according to the country where the studies' authors were located. When the published studies were examined, it was determined that approximately one out of every five studies were registered. With the prospective registration of systematic reviews, biases could be prevented, and evidence-based interventions can be made based on the knowledge obtained.
Collapse
Affiliation(s)
- Gül Dikeç
- Department of Nursing, Fenerbahce University, Istanbul, Turkey
| | - Duygu Özer
- Sultan II. Abdulhamid Han Training and Research Hospital, Psychiatric Clinic, Istanbul, Turkey
| |
Collapse
|
8
|
Wu M, Qi C, Chen Q, Liu H. Evaluating the metal recovery potential of coal fly ash based on sequential extraction and machine learning. ENVIRONMENTAL RESEARCH 2023; 224:115546. [PMID: 36828251 DOI: 10.1016/j.envres.2023.115546] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/11/2023] [Revised: 02/14/2023] [Accepted: 02/21/2023] [Indexed: 06/18/2023]
Abstract
Given the depletion of metal resources and the potential leaching of toxic elements from solid waste, secondary recovery of metal from solid waste is essential to achieve coordinated development of resources and the environment. In this study, hybrid models combining the gradient boosting decision tree and particle swarm optimization algorithm were constructed and compared based on two different datasets. Additionally, a new, quantitative evaluation index for metal recovery potential (MRP) was proposed. The results showed that the model constructed using more elemental properties could more accurately predict metal fractions in coal fly ash (CFA) with an R2 value of 0.88 achieved on the testing set. The MRP index revealed that the DAT sample had the greatest recovery potential (MRP = 43,311.70). Ca was easier to recover due to its high concentration and presence mostly in soluble fractions. Model post-analysis highlighted that the elemental properties and total concentrations generally exerted a greater influence on the metal fractions. The innovative evaluation strategy based on machine learning and sequential extraction presented in this work provides an important reference for maximizing metal recovery from CFA to achieve environmental and economic benefits with the goal of sustainable development.
Collapse
Affiliation(s)
- Mengting Wu
- School of Resources and Safety Engineering, Central South University, Changsha, 410083, China
| | - Chongchong Qi
- School of Resources and Safety Engineering, Central South University, Changsha, 410083, China; School of Metallurgy and Environment, Central South University, Changsha, 410083, China.
| | - Qiusong Chen
- School of Resources and Safety Engineering, Central South University, Changsha, 410083, China
| | - Hui Liu
- School of Metallurgy and Environment, Central South University, Changsha, 410083, China
| |
Collapse
|