1
|
Fathima MD, Raja SP, Jayanthi K, Hariharan R. OptiStack classifier: optimized stacking framework with ensemble feature engineering for enhanced cardiovascular risk prediction. Inflamm Res 2025; 74:88. [PMID: 40448718 DOI: 10.1007/s00011-025-02039-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2025] [Revised: 04/08/2025] [Accepted: 04/13/2025] [Indexed: 06/02/2025] Open
Abstract
BACKGROUND Cardiovascular diseases (CVD) are a leading cause of morbidity and mortality globally, highlighting the urgent need for accurate risk prediction to improve early intervention and management. Traditional models have difficulty capturing the complex interactions between risk factors, which limits their predictive power. OBJECTIVE This paper proposes the OptiStack Classifier, an optimized stacking framework developed to enhance CVD risk prediction through ensemble feature engineering and machine learning techniques. METHODS The model uses dimensionality reduction and ensemble feature engineering methods, including polynomial expansion, binning and domain-specific feature transformation, to improve data representation. Principal Component Analysis (PCA) is used to dimensionality reduction, improving computational efficiency. A stacking framework integrates multiple machine learning algorithms as base learners, with Logistic Regression acting as the meta-classifier. Bayesian Optimization is applied for hyperparameter tuning, further boosting predictive performance. RESULTS The proposed model shows significant improvements in predicting CVD risk, helping with early diagnosis and prevention, which can lead to better health outcomes for patients.
Collapse
Affiliation(s)
- M Dhilsath Fathima
- Department of Information Technology, Vel Tech Rangarajan Dr. Sagunthala R&D Institute of Science and Technology, Chennai, Tamil Nadu, India.
| | - S P Raja
- School of Computer Science and Engineering, Vellore Institute of Technology, Vellore, Tamil Nadu, India
| | - K Jayanthi
- Department of Computer Science and Engineering, Vel Tech Rangarajan Dr. Sagunthala R&D Institute of Science and Technology, Chennai, Tamil Nadu, India
| | - R Hariharan
- Department of Computing Technologies, School of Computing, SRM Institute of Science and Technology, Kattankulathur, Tamil Nadu, India
| |
Collapse
|
2
|
Ougahi JH, Rowan JS. Enhanced streamflow forecasting using hybrid modelling integrating glacio-hydrological outputs, deep learning and wavelet transformation. Sci Rep 2025; 15:2762. [PMID: 39843529 PMCID: PMC11754805 DOI: 10.1038/s41598-025-87187-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2024] [Accepted: 01/16/2025] [Indexed: 01/24/2025] Open
Abstract
Understanding snow and ice melt dynamics is vital for flood risk assessment and effective water resource management in populated river basins sourced in inaccessible high-mountains. This study provides an AI-enabled hybrid approach integrating glacio-hydrological model outputs (GSM-SOCONT), with different machine learning and deep learning techniques framed as alternative 'computational scenarios, leveraging both physical processes and data-driven insights for enhanced predictive capabilities. The standalone deep learning model (CNN-LSTM), relying solely on meteorological data, outperformed its counterpart machine learning and glacio-hydrological model equivalents. Hybrid models (CNN-LSTM1 to CNN-LSTM15) were trained using meteorological data augmented with glacio-hydrological model outputs representing ice and snow-melt contributions to streamflow. The hybrid model (CNN-LSTM14), using only glacier-derived features, performed best with high NSE (0.86), KGE (0.80), and R (0.93) values during calibration, and the highest NSE (0.83), KGE (0.88), R (0.91), and lowest RMSE (892) and MAE (544) during validation. Finally, a multi-scale analysis using different feature permutations was explored using wavelet transformation theory, integrating these into the final hybrid model (CNN-LSTM19), which significantly enhances predictive accuracy, particularly for high-flow events, as evidenced by improved NSE (from 0.83 to 0.97) and reduced RMSE (from 892 to 442) during validation. The comparative analysis illustrates how AI-enhanced hydrological models improve the accuracy of runoff forecasting and provide more reliable and actionable insights for managing water resources and mitigating flood risks - despite the paucity of direct measurements.
Collapse
Affiliation(s)
- Jamal Hassan Ougahi
- UNESCO Centre of Water Law, Policy & Science, University of Dundee, Dundee, UK.
- Higher Education Department, Government of the Punjab, Lahore, Pakistan.
| | - John S Rowan
- UNESCO Centre of Water Law, Policy & Science, University of Dundee, Dundee, UK
| |
Collapse
|
3
|
Yuan C, Shi Y, Ba Z, Liang D, Wang J, Liu X, Xu Y, Liu J, Xu H. Machine Learning Models for Predicting Thermal Properties of Radiative Cooling Aerogels. Gels 2025; 11:70. [PMID: 39852040 PMCID: PMC11765191 DOI: 10.3390/gels11010070] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2024] [Revised: 01/02/2025] [Accepted: 01/14/2025] [Indexed: 01/26/2025] Open
Abstract
The escalating global climate crisis and energy challenges have made the development of efficient radiative cooling materials increasingly urgent. This study presents a machine-learning-based model for predicting the performance of radiative cooling aerogels (RCAs). The model integrated multiple parameters, including the material composition (matrix material type and proportions), modification design (modifier type and content), optical properties (solar reflectance and infrared emissivity), and environmental factors (solar irradiance and ambient temperature) to achieve accurate cooling performance predictions. A comparative analysis of various machine learning algorithms revealed that an optimized XGBoost model demonstrated superior predictive performance, achieving an R2 value of 0.943 and an RMSE of 1.423 for the test dataset. An interpretability analysis using Shapley additive explanations (SHAPs) identified a ZnO modifier (SHAP value, 1.523) and environmental parameters (ambient temperature, 1.299; solar irradiance, 0.979) as the most significant determinants of cooling performance. A feature interaction analysis further elucidated the complex interplay between the material composition and environmental conditions, providing theoretical guidance for material optimization.
Collapse
Affiliation(s)
- Chengce Yuan
- AVIC Shenyang Aircraft Corporation, Shenyang 110850, China;
| | - Yimin Shi
- Key Laboratory of Bio-Based Material Science and Technology (Ministry of Education), Northeast Forestry University, Harbin 150040, China
| | - Zhichen Ba
- Key Laboratory of Bio-Based Material Science and Technology (Ministry of Education), Northeast Forestry University, Harbin 150040, China
| | - Daxin Liang
- Key Laboratory of Bio-Based Material Science and Technology (Ministry of Education), Northeast Forestry University, Harbin 150040, China
| | - Jing Wang
- Key Laboratory of Bio-Based Material Science and Technology (Ministry of Education), Northeast Forestry University, Harbin 150040, China
| | - Xiaorui Liu
- Key Laboratory of Bio-Based Material Science and Technology (Ministry of Education), Northeast Forestry University, Harbin 150040, China
| | - Yabei Xu
- Key Laboratory of Bio-Based Material Science and Technology (Ministry of Education), Northeast Forestry University, Harbin 150040, China
| | - Junreng Liu
- Key Laboratory of Bio-Based Material Science and Technology (Ministry of Education), Northeast Forestry University, Harbin 150040, China
| | - Hongbo Xu
- School of Chemistry and Chemical Engineering, Harbin Institute of Technology, Harbin 150001, China;
| |
Collapse
|
4
|
Baniecki H, Sobieski B, Szatkowski P, Bombinski P, Biecek P. Interpretable machine learning for time-to-event prediction in medicine and healthcare. Artif Intell Med 2025; 159:103026. [PMID: 39579416 DOI: 10.1016/j.artmed.2024.103026] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2024] [Revised: 08/03/2024] [Accepted: 11/15/2024] [Indexed: 11/25/2024]
Abstract
Time-to-event prediction, e.g. cancer survival analysis or hospital length of stay, is a highly prominent machine learning task in medical and healthcare applications. However, only a few interpretable machine learning methods comply with its challenges. To facilitate a comprehensive explanatory analysis of survival models, we formally introduce time-dependent feature effects and global feature importance explanations. We show how post-hoc interpretation methods allow for finding biases in AI systems predicting length of stay using a novel multi-modal dataset created from 1235 X-ray images with textual radiology reports annotated by human experts. Moreover, we evaluate cancer survival models beyond predictive performance to include the importance of multi-omics feature groups based on a large-scale benchmark comprising 11 datasets from The Cancer Genome Atlas (TCGA). Model developers can use the proposed methods to debug and improve machine learning algorithms, while physicians can discover disease biomarkers and assess their significance. We contribute open data and code resources to facilitate future work in the emerging research direction of explainable survival analysis.
Collapse
Affiliation(s)
- Hubert Baniecki
- University of Warsaw, Warsaw, Poland; Warsaw University of Technology, Warsaw, Poland.
| | - Bartlomiej Sobieski
- University of Warsaw, Warsaw, Poland; Warsaw University of Technology, Warsaw, Poland
| | - Patryk Szatkowski
- Warsaw University of Technology, Warsaw, Poland; Medical University of Warsaw, Warsaw, Poland
| | - Przemyslaw Bombinski
- Warsaw University of Technology, Warsaw, Poland; Medical University of Warsaw, Warsaw, Poland
| | - Przemyslaw Biecek
- University of Warsaw, Warsaw, Poland; Warsaw University of Technology, Warsaw, Poland
| |
Collapse
|
5
|
Ho CSH, Tan TWK, Khoe HCH, Chan YL, Tay GWN, Tang TB. Using an Interpretable Amino Acid-Based Machine Learning Method to Enhance the Diagnosis of Major Depressive Disorder. J Clin Med 2024; 13:1222. [PMID: 38592058 PMCID: PMC10931723 DOI: 10.3390/jcm13051222] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2024] [Revised: 02/13/2024] [Accepted: 02/19/2024] [Indexed: 04/10/2024] Open
Abstract
Background: Major depressive disorder (MDD) is a leading cause of disability worldwide. At present, however, there are no established biomarkers that have been validated for diagnosing and treating MDD. This study sought to assess the diagnostic and predictive potential of the differences in serum amino acid concentration levels between MDD patients and healthy controls (HCs), integrating them into interpretable machine learning models. Methods: In total, 70 MDD patients and 70 HCs matched in age, gender, and ethnicity were recruited for the study. Serum amino acid profiling was conducted by means of chromatography-mass spectrometry. A total of 21 metabolites were analysed, with 17 from a preset amino acid panel and the remaining 4 from a preset kynurenine panel. Logistic regression was applied to differentiate MDD patients from HCs. Results: The best-performing model utilised both feature selection and hyperparameter optimisation and yielded a moderate area under the receiver operating curve (AUC) classification value of 0.76 on the testing data. The top five metabolites identified as potential biomarkers for MDD were 3-hydroxy-kynurenine, valine, kynurenine, glutamic acid, and xanthurenic acid. Conclusions: Our study highlights the potential of using an interpretable machine learning analysis model based on amino acids to aid and increase the diagnostic accuracy of MDD in clinical practice.
Collapse
Affiliation(s)
- Cyrus Su Hui Ho
- Department of Psychological Medicine, Yong Loo Lin School of Medicine, National University of Singapore, Singapore 117543, Singapore;
| | - Trevor Wei Kiat Tan
- Centre for Sleep and Cognition, Yong Loo Lin School of Medicine, National University of Singapore, Singapore 117543, Singapore;
- Centre for Translational MR Research, Yong Loo Lin School of Medicine, National University of Singapore, Singapore 117543, Singapore
- Department of Electrical and Computer Engineering, National University of Singapore, Singapore 117583, Singapore
- N.1 Institute for Health & Institute for Digital Medicine (WisDM), National University of Singapore, Singapore 117456, Singapore
- Integrative Sciences and Engineering Programme (ISEP), National University of Singapore, Singapore 119077, Singapore
| | - Howard Cai Hao Khoe
- Singapore Psychiatry Residency, National Healthcare Group, Singapore 308433, Singapore;
| | - Yee Ling Chan
- Centre for Intelligent Signal and Imaging Research (CISIR), Universiti Teknologi PETRONAS (UTP), Seri Iskandar 32610, Perak, Malaysia; (Y.L.C.); (T.B.T.)
| | - Gabrielle Wann Nii Tay
- Department of Psychological Medicine, Yong Loo Lin School of Medicine, National University of Singapore, Singapore 117543, Singapore;
| | - Tong Boon Tang
- Centre for Intelligent Signal and Imaging Research (CISIR), Universiti Teknologi PETRONAS (UTP), Seri Iskandar 32610, Perak, Malaysia; (Y.L.C.); (T.B.T.)
| |
Collapse
|
6
|
Leventhal EL, Daamen AR, Grammer AC, Lipsky PE. An interpretable machine learning pipeline based on transcriptomics predicts phenotypes of lupus patients. iScience 2023; 26:108042. [PMID: 37860757 PMCID: PMC10582499 DOI: 10.1016/j.isci.2023.108042] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2023] [Revised: 07/03/2023] [Accepted: 09/21/2023] [Indexed: 10/21/2023] Open
Abstract
Machine learning (ML) has the potential to identify subsets of patients with distinct phenotypes from gene expression data. However, phenotype prediction using ML has often relied on identifying important genes without a systems biology context. To address this, we created an interpretable ML approach based on blood transcriptomics to predict phenotype in systemic lupus erythematosus (SLE), a heterogeneous autoimmune disease. We employed a sequential grouped feature importance algorithm to assess the performance of gene sets, including immune and metabolic pathways and cell types, known to be abnormal in SLE in predicting disease activity and organ involvement. Gene sets related to interferon, tumor necrosis factor, the mitoribosome, and T cell activation were the best predictors of phenotype with excellent performance. These results suggest potential relationships between the molecular pathways identified in each model and manifestations of SLE. This ML approach to phenotype prediction can be applied to other diseases and tissues.
Collapse
Affiliation(s)
- Emily L. Leventhal
- AMPEL BioSolutions LLC, and the RILITE Research Institute, Charlottesville, VA 22902, USA
| | - Andrea R. Daamen
- AMPEL BioSolutions LLC, and the RILITE Research Institute, Charlottesville, VA 22902, USA
| | - Amrie C. Grammer
- AMPEL BioSolutions LLC, and the RILITE Research Institute, Charlottesville, VA 22902, USA
| | - Peter E. Lipsky
- AMPEL BioSolutions LLC, and the RILITE Research Institute, Charlottesville, VA 22902, USA
| |
Collapse
|
7
|
Sheikholeslami R, Hall JW. Global patterns and key drivers of stream nitrogen concentration: A machine learning approach. THE SCIENCE OF THE TOTAL ENVIRONMENT 2023; 868:161623. [PMID: 36657680 PMCID: PMC10933795 DOI: 10.1016/j.scitotenv.2023.161623] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/18/2022] [Revised: 12/22/2022] [Accepted: 01/11/2023] [Indexed: 06/17/2023]
Abstract
Anthropogenic loading of nitrogen to river systems can pose serious health hazards and create critical environmental threats. Quantification of the magnitude and impact of freshwater nitrogen requires identifying key controls of nitrogen dynamics and analyzing both the past and present patterns of nitrogen flows. To tackle this challenge, we adopted a machine learning (ML) approach and built an ML-driven representation that captures spatiotemporal variability in nitrogen concentrations at global scale. Our model uses random forests to regress a large sample of monthly measured stream nitrogen concentrations onto a set of 17 predictors with a spatial resolution of 0.5-degree over the 1990-2013, including observations within the pixel and upstream drivers. The model was validated with data from rivers outside the training dataset and was used to predict nitrogen concentrations in 520 major river basins of the world, including many with scarce or no observations. We predicted that the regions with highest median nitrogen concentrations in their rivers (in 2013) were: United States (Mississippi), Pakistan, Bangladesh, India (Indus, Ganges), China (Yellow, Yangtze, Yongding, Huai), and most of Europe (Rhine, Danube, Vistula, Thames, Trent, Severn). Other major hotspots were the river basins of the Sebou (Morroco), Nakdong (South Korea), Kitakami (Japan), and Egypt's Nile Delta. Our analysis showed that the rate of increase in nitrogen concentration between 1990s and 2000s was greatest in rivers located in eastern China, eastern and central parts of Canada, Baltic states, Pakistan, mainland southeast Asia, and south-eastern Australia. Using a new grouped variable importance measure, we also found that temporality (month of the year and cumulative month count) is the most influential predictor, followed by factors representing hydroclimatic conditions, diffuse nutrient emissions from agriculture, and topographic features. Our model can be further applied to assess strategies designed to reduce nitrogen pollution in freshwater bodies at large spatial scales.
Collapse
Affiliation(s)
- Razi Sheikholeslami
- School of Geography and the Environment, University of Oxford, Oxford, UK; Environmental Change Institute, University of Oxford, Oxford, UK; Department of Civil Engineering, Sharif University of Technology, Tehran, Iran.
| | - Jim W Hall
- School of Geography and the Environment, University of Oxford, Oxford, UK; Environmental Change Institute, University of Oxford, Oxford, UK
| |
Collapse
|
8
|
Patel S, Wang M, Guo J, Smith G, Chen C. A Study of R-R Interval Transition Matrix Features for Machine Learning Algorithms in AFib Detection. SENSORS (BASEL, SWITZERLAND) 2023; 23:3700. [PMID: 37050761 PMCID: PMC10099376 DOI: 10.3390/s23073700] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/16/2023] [Revised: 03/14/2023] [Accepted: 03/28/2023] [Indexed: 06/19/2023]
Abstract
Atrial Fibrillation (AFib) is a heart condition that occurs when electrophysiological malformations within heart tissues cause the atria to lose coordination with the ventricles, resulting in "irregularly irregular" heartbeats. Because symptoms are subtle and unpredictable, AFib diagnosis is often difficult or delayed. One possible solution is to build a system which predicts AFib based on the variability of R-R intervals (the distances between two R-peaks). This research aims to incorporate the transition matrix as a novel measure of R-R variability, while combining three segmentation schemes and two feature importance measures to systematically analyze the significance of individual features. The MIT-BIH dataset was first divided into three segmentation schemes, consisting of 5-s, 10-s, and 25-s subsets. In total, 21 various features, including the transition matrix features, were extracted from these subsets and used for the training of 11 machine learning classifiers. Next, permutation importance and tree-based feature importance calculations determined the most predictive features for each model. In summary, with Leave-One-Person-Out Cross Validation, classifiers under the 25-s segmentation scheme produced the best accuracies; specifically, Gradient Boosting (96.08%), Light Gradient Boosting (96.11%), and Extreme Gradient Boosting (96.30%). Among eleven classifiers, the three gradient boosting models and Random Forest exhibited the highest overall performance across all segmentation schemes. Moreover, the permutation and tree-based importance results demonstrated that the transition matrix features were most significant with longer subset lengths.
Collapse
Affiliation(s)
- Sahil Patel
- John T. Hoggard High School, Wilmington, NC 28403, USA
- Department of Mathematics and Statistics, University of North Carolina Wilmington, Wilmington, NC 28403, USA
| | - Maximilian Wang
- Department of Mathematics and Statistics, University of North Carolina Wilmington, Wilmington, NC 28403, USA
- Isaac M. Bear Early College High School, Wilmington, NC 28403, USA
| | - Justin Guo
- Department of Computer Science, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Georgia Smith
- Department of Biostatistics, University of Colorado’s Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Cuixian Chen
- Department of Mathematics and Statistics, University of North Carolina Wilmington, Wilmington, NC 28403, USA
| |
Collapse
|
9
|
Ayano YM, Schwenker F, Dufera BD, Debelee TG. Interpretable Machine Learning Techniques in ECG-Based Heart Disease Classification: A Systematic Review. Diagnostics (Basel) 2022; 13:111. [PMID: 36611403 PMCID: PMC9818170 DOI: 10.3390/diagnostics13010111] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2022] [Revised: 12/22/2022] [Accepted: 12/23/2022] [Indexed: 12/31/2022] Open
Abstract
Heart disease is one of the leading causes of mortality throughout the world. Among the different heart diagnosis techniques, an electrocardiogram (ECG) is the least expensive non-invasive procedure. However, the following are challenges: the scarcity of medical experts, the complexity of ECG interpretations, the manifestation similarities of heart disease in ECG signals, and heart disease comorbidity. Machine learning algorithms are viable alternatives to the traditional diagnoses of heart disease from ECG signals. However, the black box nature of complex machine learning algorithms and the difficulty in explaining a model's outcomes are obstacles for medical practitioners in having confidence in machine learning models. This observation paves the way for interpretable machine learning (IML) models as diagnostic tools that can build a physician's trust and provide evidence-based diagnoses. Therefore, in this systematic literature review, we studied and analyzed the research landscape in interpretable machine learning techniques by focusing on heart disease diagnosis from an ECG signal. In this regard, the contribution of our work is manifold; first, we present an elaborate discussion on interpretable machine learning techniques. In addition, we identify and characterize ECG signal recording datasets that are readily available for machine learning-based tasks. Furthermore, we identify the progress that has been achieved in ECG signal interpretation using IML techniques. Finally, we discuss the limitations and challenges of IML techniques in interpreting ECG signals.
Collapse
Affiliation(s)
| | | | - Bisrat Derebssa Dufera
- Addis Ababa Institute of Technology, Addis Ababa University, Addis Ababa 11760, Ethiopia
| | - Taye Girma Debelee
- Ethiopian Artificial Intelligence Institute, Addis Ababa 40782, Ethiopia
- College of Electrical and Computer Engineering, Addis Ababa Science and Technology University, Addis Ababa 16417, Ethiopia
| |
Collapse
|