1
|
Potharlanka JL, M NB. Feature importance feedback with Deep Q process in ensemble-based metaheuristic feature selection algorithms. Sci Rep 2024; 14:2923. [PMID: 38316958 PMCID: PMC10844500 DOI: 10.1038/s41598-024-53141-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2023] [Accepted: 01/29/2024] [Indexed: 02/07/2024] Open
Abstract
Feature selection is an indispensable aspect of modern machine learning, especially for high-dimensional datasets where overfitting and computational inefficiencies are common concerns. Traditional methods often employ either filter, wrapper, or embedded approaches, which have limitations in terms of robustness, computational load, or capability to capture complex interactions among features. Despite the utility of metaheuristic algorithms like Particle Swarm Optimization (PSO), Firefly Algorithm (FA), and Whale Optimization (WOA) in feature selection, there still exists a gap in efficiently incorporating feature importance feedback into these processes. This paper presents a novel approach that integrates the strengths of PSO, FA, and WOA algorithms into an ensemble model and further enhances its performance by incorporating a Deep Q-Learning framework for relevance feedbacks. The Deep Q-Learning module intelligently updates feature importance based on model performance, thereby fine-tuning the selection process iteratively. Our ensemble model demonstrates substantial gains in effectiveness over traditional and individual metaheuristic approaches. Specifically, the proposed model achieved a 9.5% higher precision, an 8.5% higher accuracy, an 8.3% higher recall, a 4.9% higher AUC, and a 5.9% higher specificity across multiple software bug prediction datasets and samples. By resolving some of the key issues in existing feature selection methods and achieving superior performance metrics, this work paves the way for more robust and efficient machine learning models in various applications, from healthcare to natural language processing scenarios. This research provides an innovative framework for feature selection that promises not only superior performance but also offers a flexible architecture that can be adapted for a variety of machine learning challenges.
Collapse
Affiliation(s)
- Jhansi Lakshmi Potharlanka
- Department of Computer Science and Engineering, Vignan's Foundation for Science Technology and Research, Guntur, 522213, India.
| | - Nirupama Bhat M
- Department of Computer Science and Engineering, Vignan's Foundation for Science Technology and Research, Guntur, 522213, India
| |
Collapse
|
2
|
Liu J, Yang S, Zhang H, Sun Z, Du J. Online Multi-Label Streaming Feature Selection Based on Label Group Correlation and Feature Interaction. ENTROPY (BASEL, SWITZERLAND) 2023; 25:1071. [PMID: 37510018 PMCID: PMC10377943 DOI: 10.3390/e25071071] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/10/2023] [Revised: 07/10/2023] [Accepted: 07/14/2023] [Indexed: 07/30/2023]
Abstract
Multi-label streaming feature selection has received widespread attention in recent years because the dynamic acquisition of features is more in line with the needs of practical application scenarios. Most previous methods either assume that the labels are independent of each other, or, although label correlation is explored, the relationship between related labels and features is difficult to understand or specify. In real applications, both situations may occur where the labels are correlated and the features may belong specifically to some labels. Moreover, these methods treat features individually without considering the interaction between features. Based on this, we present a novel online streaming feature selection method based on label group correlation and feature interaction (OSLGC). In our design, we first divide labels into multiple groups with the help of graph theory. Then, we integrate label weight and mutual information to accurately quantify the relationships between features under different label groups. Subsequently, a novel feature selection framework using sliding windows is designed, including online feature relevance analysis and online feature interaction analysis. Experiments on ten datasets show that the proposed method outperforms some mature MFS algorithms in terms of predictive performance, statistical analysis, stability analysis, and ablation experiments.
Collapse
Affiliation(s)
- Jinghua Liu
- Department of Computer Science and Technology, Huaqiao University, Xiamen 361021, China
- Xiamen Key Laboratory of Computer Vision and Pattern Recognition, Huaqiao University, Xiamen 361021, China
- Fujian Key Laboratory of Big Data Intelligence and Security, Huaqiao University, Xiamen 361021, China
| | - Songwei Yang
- Department of Computer Science and Technology, Huaqiao University, Xiamen 361021, China
- Xiamen Key Laboratory of Computer Vision and Pattern Recognition, Huaqiao University, Xiamen 361021, China
- Fujian Key Laboratory of Big Data Intelligence and Security, Huaqiao University, Xiamen 361021, China
| | - Hongbo Zhang
- Department of Computer Science and Technology, Huaqiao University, Xiamen 361021, China
- Xiamen Key Laboratory of Computer Vision and Pattern Recognition, Huaqiao University, Xiamen 361021, China
- Fujian Key Laboratory of Big Data Intelligence and Security, Huaqiao University, Xiamen 361021, China
| | - Zhenzhen Sun
- Department of Computer Science and Technology, Huaqiao University, Xiamen 361021, China
- Xiamen Key Laboratory of Computer Vision and Pattern Recognition, Huaqiao University, Xiamen 361021, China
- Fujian Key Laboratory of Big Data Intelligence and Security, Huaqiao University, Xiamen 361021, China
| | - Jixiang Du
- Department of Computer Science and Technology, Huaqiao University, Xiamen 361021, China
- Xiamen Key Laboratory of Computer Vision and Pattern Recognition, Huaqiao University, Xiamen 361021, China
- Fujian Key Laboratory of Big Data Intelligence and Security, Huaqiao University, Xiamen 361021, China
| |
Collapse
|
3
|
Nemat H, Khadem H, Elliott J, Benaissa M. Causality analysis in type 1 diabetes mellitus with application to blood glucose level prediction. Comput Biol Med 2023; 153:106535. [PMID: 36640530 DOI: 10.1016/j.compbiomed.2022.106535] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2022] [Revised: 12/05/2022] [Accepted: 12/31/2022] [Indexed: 01/05/2023]
Abstract
Effective control of blood glucose level (BGL) is the key factor in the management of type 1 diabetes mellitus (T1D). BGL prediction is an important tool to help maximise the time BGL is in the target range and thus minimise both acute and chronic diabetes-related complications. To predict future BGL, histories of variables known to affect BGL, such as carbohydrate intake, injected bolus insulin, and physical activity, are utilised. Due to these identified cause and effect relationships, T1D management can be examined via the causality context. In this respect, this work initially investigates these relations and quantifies the causality strengths of each variable with BGL using the convergent cross mapping method (CCM). Then, considering the extended CCM, the causality strengths of each variable for different lags are quantified. After that, the optimal time lag for each variable is determined according to the quantified causality effects. Subsequently, the feasibility of leveraging causality information as prior knowledge for BGL prediction is investigated by proposing two approaches. In the first approach, causality strengths are used as weights for relevant affecting variables. In the second approach, the optimal causal lags and the corresponding causality strengths are considered the shifts and weights for the variables, respectively. Overall, the evaluation criteria and statistical analysis used for comparing results show the effectiveness of using causality analysis in T1D management.
Collapse
Affiliation(s)
- Hoda Nemat
- Department of Electronic and Electrical Engineering, University of Sheffield, Sheffield, S1 4DE, UK.
| | - Heydar Khadem
- Department of Electronic and Electrical Engineering, University of Sheffield, Sheffield, S1 4DE, UK.
| | - Jackie Elliott
- Department of Oncology and Metabolism, University of Sheffield, Sheffield S10 2RX, UK; Sheffield Teaching Hospitals, Diabetes and Endocrine Centre, Northern General Hospital, Sheffield S5 7AU, UK.
| | - Mohammed Benaissa
- Department of Electronic and Electrical Engineering, University of Sheffield, Sheffield, S1 4DE, UK.
| |
Collapse
|
4
|
A novel feature selection method via mining Markov blanket. APPL INTELL 2022. [DOI: 10.1007/s10489-022-03863-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
|
5
|
Online Markov Blanket Learning for High-Dimensional Data. APPL INTELL 2022. [DOI: 10.1007/s10489-022-03841-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
|
6
|
Online group streaming feature selection using entropy-based uncertainty measures for fuzzy neighborhood rough sets. COMPLEX INTELL SYST 2022. [DOI: 10.1007/s40747-022-00763-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
AbstractOnline group streaming feature selection, as an essential online processing method, can deal with dynamic feature selection tasks by considering the original group structure information of the features. Due to the fuzziness and uncertainty of the feature stream, some existing methods are unstable and yield low predictive accuracy. To address these issues, this paper presents a novel online group streaming feature selection method (FNE-OGSFS) using fuzzy neighborhood entropy-based uncertainty measures. First, a separability measure integrating the dependency degree with the coincidence degree is proposed and introduced into the fuzzy neighborhood rough sets model to define a new fuzzy neighborhood entropy. Second, inspired by both algebra and information views, some fuzzy neighborhood entropy-based uncertainty measures are investigated and some properties are derived. Furthermore, the optimal features in the group are selected to flow into the feature space according to the significance of features, and the features with interactions are left. Then, all selected features are re-evaluated by the Lasso model to discard the redundant features. Finally, an online group streaming feature selection algorithm is designed. Experimental results compared with eight representative methods on thirteen datasets show that FNE-OGSFS can achieve better comprehensive performance.
Collapse
|