1
|
Li S, Wang T, Yin H, Ding S, Cai Z. Behavioral Analysis of Postgraduate Education Satisfaction: Unveiling Key Influencing Factors with Bayesian Networks and Feature Importance. Behav Sci (Basel) 2025; 15:559. [PMID: 40282180 PMCID: PMC12024229 DOI: 10.3390/bs15040559] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2025] [Revised: 04/18/2025] [Accepted: 04/19/2025] [Indexed: 04/29/2025] Open
Abstract
Accurately evaluating postgraduate education satisfaction is crucial for improving higher education quality and optimizing management practices. Traditional methods often fail to capture the complex behavioral interactions among influencing factors. In this study, an innovative satisfaction indicator system framework is proposed that integrates a two-stage feature optimization method and the Tree Augmented Naive Bayes (TAN) model. The framework is designed to assess key satisfaction drivers across seven dimensions: course quality, research projects, mentor guidance, mentor's role, faculty management, academic enhancement, and quality development. Using data from 8903 valid responses, Confirmatory Factor Analysis (CFA) was conducted to validate the framework's reliability. The two-stage feature optimization method, including statistical pre-screening and XGBoost-based recursive feature selection, refined 49 features to 29 core indicators. The TAN model was used to construct a causal network, revealing the dynamic relationships between factors shaping satisfaction. The model outperformed four common machine learning algorithms, achieving an AUC value of 91.01%. The Birnbaum importance metric was employed to quantify the contribution of each feature, revealing the critical roles of academic resilience, academic aspirations, dedication and service spirit, creative ability, academic standards, and independent academic research ability. This study offers management recommendations, including enhancing academic support, mentorship, and interdisciplinary learning. Its findings provide data-driven insights for optimizing key indicators and improving postgraduate education satisfaction, contributing to behavioral sciences by linking satisfaction to outcomes and practices.
Collapse
Affiliation(s)
- Sheng Li
- Graduate School, Northwestern Polytechnical University, Xi’an 710072, China; (S.L.); (H.Y.)
| | - Ting Wang
- Department of Industrial Engineering, Northwestern Polytechnical University, Xi’an 710072, China;
| | - Hanqing Yin
- Graduate School, Northwestern Polytechnical University, Xi’an 710072, China; (S.L.); (H.Y.)
| | - Shuai Ding
- School of Public Policy and Administration, Northwestern Polytechnical University, Xi’an 710072, China;
| | - Zhiqiang Cai
- Department of Industrial Engineering, Northwestern Polytechnical University, Xi’an 710072, China;
| |
Collapse
|
2
|
Du Y, Zhou X, Gao Q, Yang C, Huang T. A Deep Reinforcement Learning-Based Feature Selection Method for Invasive Disease Event Prediction Using Imbalanced Follow-Up Data. IEEE J Biomed Health Inform 2025; 29:1472-1483. [PMID: 40030195 DOI: 10.1109/jbhi.2024.3497325] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/06/2025]
Abstract
The machine learning-based model is a promising paradigm for predicting invasive disease events (iDEs) in breast cancer. Feature selection (FS) is an essential preprocessing technique employed to identify the pertinent features for the prediction model. However, conventional FS methods often fail with imbalanced clinical data due to the bias towards the majority class. In this paper, a novel FS framework based on reinforcement learning (RLFS) is developed to identify the optimal feature subset for the imbalanced data. The RLFS employs an iterative methodology, wherein data resampling technique generates a balanced dataset before each iteration. A decision network is trained using a deep RL algorithm to identify the relevant features for the dataset in the current iteration. With such an iterative training strategy, numerous constructed datasets gradually boost the FS capacity of the decision network, resulting in a robust performance for imbalanced data. Finally, a weighted model is proposed to determine the most suitable FS solution. The RLFS is employed to predict breast cancer iDEs using real follow-up data. The comparison results demonstrated that RLFS effectively reduces the number of features while outperforming several state-of-the-art FS algorithms.
Collapse
|
3
|
Yang G, Li W, Xie W, Wang L, Yu K. An improved binary particle swarm optimization algorithm for clinical cancer biomarker identification in microarray data. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2024; 244:107987. [PMID: 38157825 DOI: 10.1016/j.cmpb.2023.107987] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/13/2023] [Revised: 11/04/2023] [Accepted: 12/16/2023] [Indexed: 01/03/2024]
Abstract
BACKGROUND AND OBJECTIVE The limited number of samples and high-dimensional features in microarray data make selecting a small number of features for disease diagnosis a challenging problem. Traditional feature selection methods based on evolutionary algorithms are difficult to search for the optimal set of features in a limited time when dealing with the high-dimensional feature selection problem. New solutions are proposed to solve the above problems. METHODS In this paper, we propose a hybrid feature selection method (C-IFBPFE) for biomarker identification in microarray data, which combines clustering and improved binary particle swarm optimization while incorporating an embedded feature elimination strategy. Firstly, an adaptive redundant feature judgment method based on correlation clustering is proposed for feature screening to reduce the search space in the subsequent stage. Secondly, we propose an improved flipping probability-based binary particle swarm optimization (IFBPSO), better applicable to the binary particle swarm optimization problem. Finally, we also design a new feature elimination (FE) strategy embedded in the binary particle swarm optimization algorithm. This strategy gradually removes poorer features during iterations to reduce the number of features and improve accuracy. RESULTS We compared C-IFBPFE with other published hybrid feature selection methods on eight public datasets and analyzed the impact of each improvement. The proposed method outperforms other current state-of-the-art feature selection methods in terms of accuracy, number of features, sensitivity, and specificity. The ablation study of this method validates the efficacy of each component, especially the proposed feature elimination strategy significantly improves the performance of the algorithm. CONCLUSIONS The hybrid feature selection method proposed in this paper helps address the issue of high-dimensional microarray data with few samples. It can select a small subset of features and achieve high classification accuracy on microarray datasets. Additionally, independent validation of the selected features shows that those chosen by C-IFBPFE have strong correlations with disease phenotypes and can identify important biomarkers from data related to biomedical problems.
Collapse
Affiliation(s)
- Guicheng Yang
- College of Computer Science and Engineering, Northeastern University, Shenyang, 110000, Liaoning, China.
| | - Wei Li
- Key Laboratory of Intelligent Computing in Medical Image (MIIC), Northeastern University, Ministry of Education, Shenyang, 110000, Liaoning, China; National Frontiers Science Center for Industrial Intelligence and Systems Optimization, Shenyang, 110819, Liaoning, China.
| | - Weidong Xie
- College of Computer Science and Engineering, Northeastern University, Shenyang, 110000, Liaoning, China.
| | - Linjie Wang
- College of Computer Science and Engineering, Northeastern University, Shenyang, 110000, Liaoning, China.
| | - Kun Yu
- College of Medicine and Bioinformation Engineering, Northeastern University, Shenyang, 110819, Liaoning, China.
| |
Collapse
|
4
|
Barrera-García J, Cisternas-Caneo F, Crawford B, Gómez Sánchez M, Soto R. Feature Selection Problem and Metaheuristics: A Systematic Literature Review about Its Formulation, Evaluation and Applications. Biomimetics (Basel) 2023; 9:9. [PMID: 38248583 PMCID: PMC10813816 DOI: 10.3390/biomimetics9010009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2023] [Revised: 12/16/2023] [Accepted: 12/18/2023] [Indexed: 01/23/2024] Open
Abstract
Feature selection is becoming a relevant problem within the field of machine learning. The feature selection problem focuses on the selection of the small, necessary, and sufficient subset of features that represent the general set of features, eliminating redundant and irrelevant information. Given the importance of the topic, in recent years there has been a boom in the study of the problem, generating a large number of related investigations. Given this, this work analyzes 161 articles published between 2019 and 2023 (20 April 2023), emphasizing the formulation of the problem and performance measures, and proposing classifications for the objective functions and evaluation metrics. Furthermore, an in-depth description and analysis of metaheuristics, benchmark datasets, and practical real-world applications are presented. Finally, in light of recent advances, this review paper provides future research opportunities.
Collapse
Affiliation(s)
- José Barrera-García
- Escuela de Ingeniería Informática, Pontificia Universidad Católica de Valparaíso, Avenida Brasil 2241, Valparaíso 2362807, Chile; (J.B.-G.); (F.C.-C.); (R.S.)
| | - Felipe Cisternas-Caneo
- Escuela de Ingeniería Informática, Pontificia Universidad Católica de Valparaíso, Avenida Brasil 2241, Valparaíso 2362807, Chile; (J.B.-G.); (F.C.-C.); (R.S.)
| | - Broderick Crawford
- Escuela de Ingeniería Informática, Pontificia Universidad Católica de Valparaíso, Avenida Brasil 2241, Valparaíso 2362807, Chile; (J.B.-G.); (F.C.-C.); (R.S.)
| | - Mariam Gómez Sánchez
- Departamento de Electrotecnia e Informática, Universidad Técnica Federico Santa María, Federico Santa María 6090, Viña del Mar 2520000, Chile;
| | - Ricardo Soto
- Escuela de Ingeniería Informática, Pontificia Universidad Católica de Valparaíso, Avenida Brasil 2241, Valparaíso 2362807, Chile; (J.B.-G.); (F.C.-C.); (R.S.)
| |
Collapse
|
5
|
Xu X, Zhou X. Deep Learning Based Feature Selection and Ensemble Learning for Sintering State Recognition. SENSORS (BASEL, SWITZERLAND) 2023; 23:9217. [PMID: 38005603 PMCID: PMC10674174 DOI: 10.3390/s23229217] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/18/2023] [Revised: 11/08/2023] [Accepted: 11/09/2023] [Indexed: 11/26/2023]
Abstract
Sintering is a commonly used agglomeration process to prepare iron ore fines for blast furnace. The quality of sinter significantly impacts the blast furnace ironmaking process. In the vast majority of sintering plants, the judgment of sintering quality still relies on the intuitive observation of the cross section at sintering machine tail by operators, which is susceptible to the external environment and the experience of operators. In this paper, we propose a new sintering state recognition method using deep learning based feature selection and ensemble learning. First, features from the infrared thermal images of sinter cross section at the tail of the sinterer are extracted based on ResNeXt. Then, to eliminate the irrelevant, redundant and noisy features, an efficient feature selection method based on binary state transition algorithm (BSTA) is proposed to find the truly useful features. Subsequently, an ensemble learning (EL) method based on group decision making (GDM) is proposed to recognize the sintering states. Novel combination strategies considering the varying performance of the base learners are designed to further improve recognition accuracy. Industrial experiments conducted at a steel plant verify the effectiveness and superiority of the proposed method.
Collapse
Affiliation(s)
- Xinran Xu
- School of Computer Science and Engineering, Central South University, Changsha 410083, China;
| | - Xiaojun Zhou
- School of Automation, Central South University, Changsha 410083, China
| |
Collapse
|
6
|
Bao T, Wang C, Yang P, Xie SQ, Zhang ZQ, Zhou P. LSTM-AE for Domain Shift Quantification in Cross-Day Upper-Limb Motion Estimation Using Surface Electromyography. IEEE Trans Neural Syst Rehabil Eng 2023; 31:2570-2580. [PMID: 37252871 DOI: 10.1109/tnsre.2023.3281455] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/01/2023]
Abstract
Although deep learning (DL) techniques have been extensively researched in upper-limb myoelectric control, system robustness in cross-day applications is still very limited. This is largely caused by non-stable and time-varying properties of surface electromyography (sEMG) signals, resulting in domain shift impacts on DL models. To this end, a reconstruction-based method is proposed for domain shift quantification. Herein, a prevalent hybrid framework that combines a convolutional neural network (CNN) and a long short-term memory network (LSTM), i.e. CNN-LSTM, is selected as the backbone. The paring of auto-encoder (AE) and LSTM, abbreviated as LSTM-AE, is proposed to reconstruct CNN features. Based on reconstruction errors (RErrors) of LSTM-AE, domain shift impacts on CNN-LSTM can be quantified. For a thorough investigation, experiments were conducted in both hand gesture classification and wrist kinematics regression, where sEMG data were both collected in multi-days. Experiment results illustrate that, when the estimation accuracy degrades substantially in between-day testing sets, RErrors increase accordingly and can be distinct from those obtained in within-day datasets. According to data analysis, CNN-LSTM classification/regression outcomes are strongly associated with LSTM-AE errors. The average Pearson correlation coefficients could reach -0.986 ± 0.014 and -0.992 ± 0.011, respectively.
Collapse
|
7
|
Du Y, Zhou X, Huang T, Yang C. A hierarchical evolution of neural architecture search method based on state transition algorithm. INT J MACH LEARN CYB 2023. [DOI: 10.1007/s13042-023-01794-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/16/2023]
|
8
|
Huang Z, Yang C, Zhou X, Gui W, Huang T. Brain-inspired STA for parameter estimation of fractional-order memristor-based chaotic systems. APPL INTELL 2023. [DOI: 10.1007/s10489-022-04435-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/09/2023]
|
9
|
Zhu Y, Li W, Li T. A hybrid Artificial Immune optimization for high-dimensional feature selection. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2022.110111] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
|
10
|
An Efficient Hybrid Feature Selection Method Using the Artificial Immune Algorithm for High-Dimensional Data. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2022; 2022:1452301. [PMID: 36275946 PMCID: PMC9584659 DOI: 10.1155/2022/1452301] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/11/2022] [Revised: 07/31/2022] [Accepted: 08/29/2022] [Indexed: 12/02/2022]
Abstract
Feature selection provides the optimal subset of features for data mining models. However, current feature selection methods for high-dimensional data also require a better balance between feature subset quality and computational cost. In this paper, an efficient hybrid feature selection method (HFIA) based on artificial immune algorithm optimization is proposed to solve the feature selection problem of high-dimensional data. The algorithm combines filter algorithms and improves clone selection algorithms to explore the feature space of high-dimensional data. According to the target requirements of feature selection, combined with biological research results, this method introduces the lethal mutation mechanism and the Cauchy operator to improve the search performance of the algorithm. Moreover, the adaptive adjustment factor is introduced in the mutation and update phases of the algorithm. The effective combination of these mechanisms enables the algorithm to obtain a better search ability and lower computational costs. Experimental comparisons with 19 state-of-the-art feature selection methods are conducted on 25 high-dimensional benchmark datasets. The results show that the feature reduction rate for all datasets is above 99%, and the performance improvement for the classifier is between 5% and 48.33%. Compared with the five classical filtering feature selection methods, the computational cost of HFIA is lower than the two of them, and it is far better than these five algorithms in terms of the feature reduction rate and classification accuracy improvement. Compared with the 14 hybrid feature selection methods reported in the latest literature, the average winning rates in terms of classification accuracy, feature reduction rate, and computational cost are 85.83%, 88.33%, and 96.67%, respectively.
Collapse
|
11
|
Abed-alguni BH, Alawad NA, Al-Betar MA, Paul D. Opposition-based sine cosine optimizer utilizing refraction learning and variable neighborhood search for feature selection. APPL INTELL 2022; 53:13224-13260. [PMID: 36247211 PMCID: PMC9547101 DOI: 10.1007/s10489-022-04201-z] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/21/2022] [Indexed: 12/03/2022]
Abstract
This paper proposes new improved binary versions of the Sine Cosine Algorithm (SCA) for the Feature Selection (FS) problem. FS is an essential machine learning and data mining task of choosing a subset of highly discriminating features from noisy, irrelevant, high-dimensional, and redundant features to best represent a dataset. SCA is a recent metaheuristic algorithm established to emulate a model based on sine and cosine trigonometric functions. It was initially proposed to tackle problems in the continuous domain. The SCA has been modified to Binary SCA (BSCA) to deal with the binary domain of the FS problem. To improve the performance of BSCA, three accumulative improved variations are proposed (i.e., IBSCA1, IBSCA2, and IBSCA3) where the last version has the best performance. IBSCA1 employs Opposition Based Learning (OBL) to help ensure a diverse population of candidate solutions. IBSCA2 improves IBSCA1 by adding Variable Neighborhood Search (VNS) and Laplace distribution to support several mutation methods. IBSCA3 improves IBSCA2 by optimizing the best candidate solution using Refraction Learning (RL), a novel OBL approach based on light refraction. For performance evaluation, 19 real-wold datasets, including a COVID-19 dataset, were selected with different numbers of features, classes, and instances. Three performance measurements have been used to test the IBSCA versions: classification accuracy, number of features, and fitness values. Furthermore, the performance of the last variation of IBSCA3 is compared against 28 existing popular algorithms. Interestingly, IBCSA3 outperformed almost all comparative methods in terms of classification accuracy and fitness values. At the same time, it was ranked 15 out of 19 in terms of number of features. The overall simulation and statistical results indicate that IBSCA3 performs better than the other algorithms.
Collapse
Affiliation(s)
| | | | - Mohammed Azmi Al-Betar
- Artificial Intelligence Research Center (AIRC), College of Engineering and Information Technology, Ajman University, Ajman, United Arab Emirates
| | - David Paul
- School of Science and Technology, University of New England, Armidale, Australia
| |
Collapse
|
12
|
Xing Y, Kochunov P, van Erp TG, Ma T, Calhoun VD, Du Y. A novel neighborhood rough set-based feature selection method and its application to biomarker identification of schizophrenia. IEEE J Biomed Health Inform 2022; 27:215-226. [PMID: 36201411 PMCID: PMC10076451 DOI: 10.1109/jbhi.2022.3212479] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
Feature selection can disclose biomarkers of mental disorders that have unclear biological mechanisms. Although neighborhood rough set (NRS) has been applied to discover important sparse features, it has hardly ever been utilized in neuroimaging-based biomarker identification, probably due to the inadequate feature evaluation metric and incomplete information provided under a single-granularity. Here, we propose a new NRS-based feature selection method and successfully identify brain functional connectivity biomarkers of schizophrenia (SZ) using functional magnetic resonance imaging (fMRI) data. Specifically, we develop a new weighted metric based on NRS combined with information entropy to evaluate the capacity of features in distinguishing different groups. Inspired by multi-granularity information maximization theory, we further take advantage of the complementary information from different neighborhood sizes via a multi-granularity fusion to obtain the most discriminative and stable features. For validation, we compare our method with six popular feature selection methods using three public omics datasets as well as resting-state fMRI data of 393 SZ patients and 429 healthy controls. Results show that our method obtained higher classification accuracies on both omics data (100.0%, 88.6%, and 72.2% for three omics datasets, respectively) and fMRI data (93.9% for main dataset, and 76.3% and 83.8% for two independent datasets, respectively). Moreover, our findings reveal biologically meaningful substrates of SZ, notably involving the connectivity between the thalamus and superior temporal gyrus as well as between the postcentral gyrus and calcarine gyrus. Taken together, we propose a new NRS-based feature selection method that shows the potential of exploring effective and sparse neuroimaging-based biomarkers of mental disorders.
Collapse
Affiliation(s)
- Ying Xing
- School of Computer and Information Technology, Shanxi University, Taiyuan, China
| | - Peter Kochunov
- Maryland Psychiatric Research Center and Department of Psychiatry, University of Maryland, School of Medicine, Baltimore, MD, USA
| | - Theo G.M. van Erp
- Department of Psychiatry and Human Behavior, School of Medicine, University of California, Irvine, CA, USA
| | - Tianzhou Ma
- Department of Epidemiology and Biostatistics, University of Maryland, College Park, MD, USA
| | - Vince D. Calhoun
- Tri-Institutional Center for Translational Research in Neuroimaging and Data Science (TReNDS), Georgia State University, Georgia Institute of Technology, Emory University, Atlanta, GA, USA
| | - Yuhui Du
- School of Computer and Information Technology, Shanxi University, Taiyuan, China
| |
Collapse
|
13
|
Song XF, Zhang Y, Gong DW, Gao XZ. A Fast Hybrid Feature Selection Based on Correlation-Guided Clustering and Particle Swarm Optimization for High-Dimensional Data. IEEE TRANSACTIONS ON CYBERNETICS 2022; 52:9573-9586. [PMID: 33729976 DOI: 10.1109/tcyb.2021.3061152] [Citation(s) in RCA: 34] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
The "curse of dimensionality" and the high computational cost have still limited the application of the evolutionary algorithm in high-dimensional feature selection (FS) problems. This article proposes a new three-phase hybrid FS algorithm based on correlation-guided clustering and particle swarm optimization (PSO) (HFS-C-P) to tackle the above two problems at the same time. To this end, three kinds of FS methods are effectively integrated into the proposed algorithm based on their respective advantages. In the first and second phases, a filter FS method and a feature clustering-based method with low computational cost are designed to reduce the search space used by the third phase. After that, the third phase applies oneself to finding an optimal feature subset by using an evolutionary algorithm with the global searchability. Moreover, a symmetric uncertainty-based feature deletion method, a fast correlation-guided feature clustering strategy, and an improved integer PSO are developed to improve the performance of the three phases, respectively. Finally, the proposed algorithm is validated on 18 publicly available real-world datasets in comparison with nine FS algorithms. Experimental results show that the proposed algorithm can obtain a good feature subset with the lowest computational cost.
Collapse
|
14
|
Zhang Y, Hu Y, Gao X, Gong D, Guo Y, Gao K, Zhang W. An embedded vertical‐federated feature selection algorithm based on particle swarm optimisation. CAAI TRANSACTIONS ON INTELLIGENCE TECHNOLOGY 2022. [DOI: 10.1049/cit2.12122] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Affiliation(s)
- Yong Zhang
- School of Information and Control Engineering China University of Mining and Technology Xuzhou China
- The Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education Jilin University Changchun China
| | - Ying Hu
- School of Information and Control Engineering China University of Mining and Technology Xuzhou China
| | - Xiaozhi Gao
- School of Computing University of Eastern Finland Kuopio Finland
| | - Dunwei Gong
- School of Information and Control Engineering China University of Mining and Technology Xuzhou China
| | - Yinan Guo
- School of Information and Control Engineering China University of Mining and Technology Xuzhou China
| | - Kaizhou Gao
- The Macau Institute of Systems Engineering Macau University of Science and Technology Taipa China
| | - Wanqiu Zhang
- School of Information and Control Engineering China University of Mining and Technology Xuzhou China
| |
Collapse
|
15
|
Chen K, Xue B, Zhang M, Zhou F. An Evolutionary Multitasking-Based Feature Selection Method for High-Dimensional Classification. IEEE TRANSACTIONS ON CYBERNETICS 2022; 52:7172-7186. [PMID: 33382668 DOI: 10.1109/tcyb.2020.3042243] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Feature selection (FS) is an important data preprocessing technique in data mining and machine learning, which aims to select a small subset of information features to increase the performance and reduce the dimensionality. Particle swarm optimization (PSO) has been successfully applied to FS due to being efficient and easy to implement. However, most of the existing PSO-based FS methods face the problems of trapping into local optima and computationally expensive high-dimensional data. Multifactorial optimization (MFO), as an effective evolutionary multitasking paradigm, has been widely used for solving complex problems through implicit knowledge transfer between related tasks. Inspired by MFO, this study proposes a novel PSO-based FS method to solve high-dimensional classification via information sharing between two related tasks generated from a dataset. To be specific, two related tasks about the target concept are established by evaluating the importance of features. A new crossover operator, called assortative mating, is applied to share information between these two related tasks. In addition, two mechanisms, which are variable-range strategy and subset updating mechanism, are also developed to reduce the search space and maintain the diversity of the population, respectively. The results show that the proposed FS method can achieve higher classification accuracy with a smaller feature subset in a reasonable time than the state-of-the-art FS methods on the examined high-dimensional classification problems.
Collapse
|
16
|
Application of ANN in Induction-Motor Fault-Detection System Established with MRA and CFFS. MATHEMATICS 2022. [DOI: 10.3390/math10132250] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
Abstract
This paper proposes a fault-detection system for faulty induction motors (bearing faults, interturn shorts, and broken rotor bars) based on multiresolution analysis (MRA), correlation and fitness values-based feature selection (CFFS), and artificial neural network (ANN). First, this study compares two feature-extraction methods: the MRA and the Hilbert Huang transform (HHT) for induction-motor-current signature analysis. Furthermore, feature-selection methods are compared to reduce the number of features and maintain the best accuracy of the detection system to lower operating costs. Finally, the proposed detection system is tested with additive white Gaussian noise, and the signal-processing method and feature-selection method with good performance are selected to establish the best detection system. According to the results, features extracted from MRA can achieve better performance than HHT using CFFS and ANN. In the proposed detection system, CFFS significantly reduces the operation cost (95% of the number of features) and maintains 93% accuracy using ANN.
Collapse
|
17
|
An enhanced binary Rat Swarm Optimizer based on local-best concepts of PSO and collaborative crossover operators for feature selection. Comput Biol Med 2022; 147:105675. [PMID: 35687926 DOI: 10.1016/j.compbiomed.2022.105675] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2022] [Revised: 05/24/2022] [Accepted: 05/26/2022] [Indexed: 11/22/2022]
Abstract
In this paper, an enhanced binary version of the Rat Swarm Optimizer (RSO) is proposed to deal with Feature Selection (FS) problems. FS is an important data reduction step in data mining which finds the most representative features from the entire data. Many FS-based swarm intelligence algorithms have been used to tackle FS. However, the door is still open for further investigations since no FS method gives cutting-edge results for all cases. In this paper, a recent swarm intelligence metaheuristic method called RSO which is inspired by the social and hunting behavior of a group of rats is enhanced and explored for FS problems. The binary enhanced RSO is built based on three successive modifications: i) an S-shape transfer function is used to develop binary RSO algorithms; ii) the local search paradigm of particle swarm optimization is used with the iterative loop of RSO to boost its local exploitation; iii) three crossover mechanisms are used and controlled by a switch probability to improve the diversity. Based on these enhancements, three versions of RSO are produced, referred to as Binary RSO (BRSO), Binary Enhanced RSO (BERSO), and Binary Enhanced RSO with Crossover operators (BERSOC). To assess the performance of these versions, a benchmark of 24 datasets from various domains is used. The proposed methods are assessed concerning the fitness value, number of selected features, classification accuracy, specificity, sensitivity, and computational time. The best performance is achieved by BERSOC followed by BERSO and then BRSO. These proposed versions are comparatively assessed against 25 well-regarded metaheuristic methods and five filter-based approaches. The obtained results underline their superiority by producing new best results for some datasets.
Collapse
|
18
|
Dong Y, Zhang H, Wang C, Zhou X. An adaptive state transition algorithm with local enhancement for global optimization. Appl Soft Comput 2022. [DOI: 10.1016/j.asoc.2022.108733] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
|
19
|
Fan H, Xue L, Song Y, Li M. A repetitive feature selection method based on improved ReliefF for missing data. APPL INTELL 2022. [DOI: 10.1007/s10489-022-03327-4] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
20
|
Nonlinear bilevel programming approach for decentralized supply chain using a hybrid state transition algorithm. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2022.108119] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
21
|
Binary Horse herd optimization algorithm with crossover operators for feature selection. Comput Biol Med 2021; 141:105152. [PMID: 34952338 DOI: 10.1016/j.compbiomed.2021.105152] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2021] [Revised: 12/11/2021] [Accepted: 12/14/2021] [Indexed: 01/30/2023]
Abstract
This paper proposes a binary version of Horse herd Optimization Algorithm (HOA) to tackle Feature Selection (FS) problems. This algorithm mimics the conduct of a pack of horses when they are trying to survive. To build a Binary version of HOA, or referred to as BHOA, twofold of adjustments were made: i) Three transfer functions, namely S-shape, V-shape and U-shape, are utilized to transform the continues domain into a binary one. Four configurations of each transfer function are also well studied to yield four alternatives. ii) Three crossover operators: one-point, two-point and uniform are also suggested to ensure the efficiency of the proposed method for FS domain. The performance of the proposed fifteen BHOA versions is examined using 24 real-world FS datasets. A set of six metric measures was used to evaluate the outcome of the optimization methods: accuracy, number of features selected, fitness values, sensitivity, specificity and computational time. The best-formed version of the proposed versions is BHOA with S-shape and one-point crossover. The comparative evaluation was also accomplished against 21 state-of-the-art methods. The proposed method is able to find very competitive results where some of them are the best-recorded. Due to the viability of the proposed method, it can be further considered in other areas of machine learning.
Collapse
|
22
|
Bai D, Liu T, Han X, Yi H. Application Research on Optimization Algorithm of sEMG Gesture Recognition Based on Light CNN+LSTM Model. CYBORG AND BIONIC SYSTEMS 2021; 2021:9794610. [PMID: 36285146 PMCID: PMC9494710 DOI: 10.34133/2021/9794610] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2021] [Accepted: 09/29/2021] [Indexed: 12/02/2022] Open
Abstract
The deep learning gesture recognition based on surface electromyography plays an increasingly important role in human-computer interaction. In order to ensure the high accuracy of deep learning in multistate muscle action recognition and ensure that the training model can be applied in the embedded chip with small storage space, this paper presents a feature model construction and optimization method based on multichannel sEMG amplification unit. The feature model is established by using multidimensional sequential sEMG images by combining convolutional neural network and long-term memory network to solve the problem of multistate sEMG signal recognition. The experimental results show that under the same network structure, the sEMG signal with fast Fourier transform and root mean square as feature data processing has a good recognition rate, and the recognition accuracy of complex gestures is 91.40%, with the size of 1 MB. The model can still control the artificial hand accurately when the model is small and the precision is high.
Collapse
Affiliation(s)
- Dianchun Bai
- School of Electrical Engineering, Shenyang University of Technology, Shenyang 110870, China
- Department of Mechanical Engineering and Intelligent Systems, University of Electro-Communications, Tokyo 182-8585, Japan
| | - Tie Liu
- School of Electrical Engineering, Shenyang University of Technology, Shenyang 110870, China
| | - Xinghua Han
- School of Electrical Engineering, Shenyang University of Technology, Shenyang 110870, China
| | - Hongyu Yi
- School of Electrical Engineering, Shenyang University of Technology, Shenyang 110870, China
| |
Collapse
|
23
|
Li Z, Du J, Nie B, Xiong W, Xu G, Luo J. A new two-stage hybrid feature selection algorithm and its application in Chinese medicine. INT J MACH LEARN CYB 2021. [DOI: 10.1007/s13042-021-01445-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
24
|
Taherkhani N, Sepehri MM, Khasha R, Shafaghi S. Determining the Level of Importance of Variables in Predicting Kidney Transplant Survival Based on a Novel Ranking Method. Transplantation 2021; 105:2307-2315. [PMID: 33534528 DOI: 10.1097/tp.0000000000003623] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
BACKGROUND Kidney transplantation is the best alternative treatment for end-stage renal disease. To optimal use of donated kidneys, graft predicted survival can be used as a factor to allocate kidneys. The performance of prediction techniques is highly dependent on the correct selection of predictors. Hence, the main objective of this research is to propose a novel method for ranking the effective variables for predicting the kidney transplant survival. METHODS Five classification models were used to classify kidney recipients in long- and short-term survival classes. Synthetic minority oversampling and random undersampling were used to overcome the imbalanced class problem. In dealing with missing values, 2 approaches were used (eliminating and imputing them). All variables were categorized into 4 levels. The ranking was evaluated using the sensitivity analysis approach. RESULTS Thirty-four of the 41 variables were identified as important variables, of which, 5 variables were categorized in very important level ("Recipient creatinine at discharge," "Recipient dialysis time," "Donor history of diabetes," "Donor kidney biopsy," and "Donor cause of death"), 17 variables in important level, and 12 variables in the low important level. CONCLUSIONS In this study, we identify new variables that have not been addressed in any of the previous studies (eg, AGE_DIF and MATCH_GEN). On the other hand, in kidney allocation systems, 2 main criteria are considered: equity and utility. One of the utility subcriteria is the graft survival. Our study findings can be used in the design of systems to predict the graft survival.
Collapse
Affiliation(s)
- Nasrin Taherkhani
- Faculty Member of Computer Engineering, Payam-e-Noor University, Saveh, Iran
| | - Mohammad Mehdi Sepehri
- Department of Healthcare Systems Engineering, Faculty of Industrial and Systems Engineering, Tarbiat Modares University, Tehran, Iran
| | - Roghaye Khasha
- Center of Excellence in Healthcare Systems Engineering, Tarbiat Modares University, Tehran, Iran
| | - Shadi Shafaghi
- Lung Transplantation Research Center, National Research Institute of Tuberculosis and Lung Diseases (NRITLD), Shahid Beheshti University of Medical Sciences, Tehran, Iran
| |
Collapse
|
25
|
|
26
|
Functional deep echo state network improved by a bi-level optimization approach for multivariate time series classification. Appl Soft Comput 2021. [DOI: 10.1016/j.asoc.2021.107314] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
|
27
|
Xu Z, Shen D, Kou Y, Nie T. A hybrid feature selection algorithm combining ReliefF and Particle swarm optimization for high-dimensional medical data. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS 2021. [DOI: 10.3233/jifs-202948] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Due to high-dimensional feature and strong correlation of features, the classification accuracy of medical data is not as good enough as expected. feature selection is a common algorithm to solve this problem, and selects effective features by reducing the dimensionality of high-dimensional data. However, traditional feature selection algorithms have the blindness of threshold setting and the search algorithms are liable to fall into a local optimal solution. Based on it, this paper proposes a hybrid feature selection algorithm combining ReliefF and Particle swarm optimization. The algorithm is mainly divided into three parts: Firstly, the ReliefF is used to calculate the feature weight, and the features are ranked by the weight. Then ranking feature is grouped according to the density equalization, where the density of features in each group is the same. Finally, the Particle Swarm Optimization algorithm is used to search the ranking feature groups, and the feature selection is performed according to a new fitness function. Experimental results show that the random forest has the highest classification accuracy on the features selected. More importantly, it has the least number of features. In addition, experimental results on 2 medical datasets show that the average accuracy of random forest reaches 90.20%, which proves that the hybrid algorithm has a certain application value.
Collapse
Affiliation(s)
- Zhaozhao Xu
- School of Computer Science and Engineering, Northeastern University, Shenyang, China
| | - Derong Shen
- School of Computer Science and Engineering, Northeastern University, Shenyang, China
| | - Yue Kou
- School of Computer Science and Engineering, Northeastern University, Shenyang, China
| | - Tiezheng Nie
- School of Computer Science and Engineering, Northeastern University, Shenyang, China
| |
Collapse
|
28
|
Feng Y, Wang X, Zhang J. A heterogeneous ensemble learning method for neuroblastoma survival prediction. IEEE J Biomed Health Inform 2021; 26:1472-1483. [PMID: 33848254 DOI: 10.1109/jbhi.2021.3073056] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Neuroblastoma is a pediatric cancer with high morbidity and mortality. Accurate survival prediction of patients with neuroblastoma plays an important role in the formulation of treatment plans. In this study, we proposed a heterogeneous ensemble learning method to predict the survival of neuroblastoma patients and extract decision rules from the proposed method to assist doctors in making decisions. After data preprocessing, five heterogeneous base learners were developed, which consisted of decision tree, random forest, support vector machine based on genetic algorithm, extreme gradient boosting and light gradient boosting machine. Subsequently, a heterogeneous feature selection method was devised to obtain the optimal feature subset of each base learner, and the optimal feature subset of each base learner guided the construction of the base learners as a priori knowledge. Furthermore, an area under curve-based ensemble mechanism was proposed to integrate the five heterogeneous base learners. Finally, the proposed method was compared with mainstream machine learning methods from different indicators, and valuable information was extracted by using the partial dependency plot analysis method and rule-extracted method from the proposed method. Experimental results show that the proposed method achieves an accuracy of 91.64%, recall of 91.14%, and AUC of 91.35% and is significantly better than the mainstream machine learning methods. In addition, interpretable rules with accuracy higher than 0.900 and predicted responses are extracted from the proposed method. Our study can effectively improve the performance of the clinical decision support system to improve the survival of neuroblastoma patients.
Collapse
|
29
|
Zhou X, Zhang R, Yang K, Yang C, Huang T. Using hybrid normalization technique and state transition algorithm to VIKOR method for influence maximization problem. Neurocomputing 2020. [DOI: 10.1016/j.neucom.2020.05.084] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
|
30
|
|
31
|
Establish Induction Motor Fault Diagnosis System Based on Feature Selection Approaches with MRA. Processes (Basel) 2020. [DOI: 10.3390/pr8091055] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
This paper proposes a feature selection (FS) approach, namely, correlation and fitness value-based feature selection (CFFS). CFFS is an improvement feature selection approach of correlation-based feature selection (CFS) for the common failure cases of the induction motor. CFFS establishes the induction motor fault detection (FD) system with artificial neural network (ANN). This study analyzes the current signal of the induction motor with multiresolution analysis (MRA), extracts the features, and uses feature selection approaches (ReliefF, CFS, and CFFS) to reduce the number of features and maintain the accuracy of the induction motor fault detection system. Finally, the induction motor fault detection system is trained by the feature selection approaches selected features. The best induction motor fault detection system will be established through the comparison of the efficiency of these FS approaches.
Collapse
|
32
|
|
33
|
Key Quality Indicators Prediction for Web Browsing with Embedded Filter Feature Selection. APPLIED SCIENCES-BASEL 2020. [DOI: 10.3390/app10062141] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
In this paper, the prediction of over-the-top service quality is discussed, which is a promising way for mobile network engineers to tackle service deterioration as early as possible. Currently, traditional mobile network operation often takes appropriate remedial measures, when receiving customers’ complaints about service problems. With the popularity of over-the-top services, this problem has become increasingly serious. Based on the service perception data crowd-sensed from massive smartphones in the mobile network, we first investigated the application of multi-label ReliefF, a well-known method of feature selection, in determining the feature weights of the perception data and propose a unified multi-label ReliefF (UML-ReliefF) algorithm. Then a feature-weighted multi-label k-nearest neighbor (ML-kNN) algorithm is proposed for the key quality indicators (KQI) prediction, by combining the UML-ReliefF and ML-kNN together in the learning. The experimental results for web browsing service show that UML-ReliefF can effectively identify the most influential features of the data and thus, lead to better performance for KQI prediction. The experiments also show that the feature-weighted KQI prediction is superior to its unweighted counterpart, since the former takes full advantage of all the features in the learning. Although there is still much room of improvement in the precision of the prediction, the proposed method is highly potential for network engineers to find the deterioration of service quality promptly and take measures before it is too late.
Collapse
|
34
|
Han J, Yang C, Lim CC, Zhou X, Shi P, Gui W. Power scheduling optimization under single-valued neutrosophic uncertainty. Neurocomputing 2020. [DOI: 10.1016/j.neucom.2019.11.089] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
|
35
|
Huang Z, Yang C, Zhou X, Yang S. Energy Consumption Forecasting for the Nonferrous Metallurgy Industry Using Hybrid Support Vector Regression with an Adaptive State Transition Algorithm. Cognit Comput 2019. [DOI: 10.1007/s12559-019-09644-0] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
36
|
Huang Z, Yang C, Chen X, Huang K, Xie Y. Adaptive over-sampling method for classification with application to imbalanced datasets in aluminum electrolysis. Neural Comput Appl 2019. [DOI: 10.1007/s00521-019-04208-7] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
|