1
|
Yang QT, Xu XX, Zhan ZH, Zhong J, Kwong S, Zhang J. Evolutionary Multitask Optimization for Multiform Feature Selection in Classification. IEEE TRANSACTIONS ON CYBERNETICS 2025; 55:1673-1686. [PMID: 40031579 DOI: 10.1109/tcyb.2025.3535722] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/05/2025]
Abstract
Feature selection (FS) is a significant research topic in machine learning and artificial intelligence, but it becomes complicated in the high dimensional search space due to the vast number of features. Evolutionary computation (EC) has been widely used in solving FS by modeling it as an expensive wrapper-form optimization task, where a classifier is used to obtain classification accuracy for fitness evaluation (FE). In this article, we propose that the FS problem can be also modeled as a cheap filter-form optimization task, where the FE is based on the relevance and redundancy of the selected features. The wrapper-form optimization task is beneficial for classification accuracy while the filter-form optimization task has the strength of a lighter computational cost. Therefore, different from existing multitask-based FS that uses various wrapper-form optimization tasks, this article uses a multiform optimization technique to model the FS problem as a wrapper-form optimization task and a filter-form optimization task simultaneously. An evolutionary multitask FS (EMTFS) algorithm for parallel tacking these two tasks is proposed followed by, in which a two-channel knowledge transfer strategy is proposed to transfer positive knowledge across the two tasks. Experiments on widely used public datasets show that EMTFS can select as few features as possible on the premise of superior classification accuracy than the compared state-of-the-art FS algorithms.
Collapse
|
2
|
Du Y, Zhou X, Gao Q, Yang C, Huang T. A Deep Reinforcement Learning-Based Feature Selection Method for Invasive Disease Event Prediction Using Imbalanced Follow-Up Data. IEEE J Biomed Health Inform 2025; 29:1472-1483. [PMID: 40030195 DOI: 10.1109/jbhi.2024.3497325] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/06/2025]
Abstract
The machine learning-based model is a promising paradigm for predicting invasive disease events (iDEs) in breast cancer. Feature selection (FS) is an essential preprocessing technique employed to identify the pertinent features for the prediction model. However, conventional FS methods often fail with imbalanced clinical data due to the bias towards the majority class. In this paper, a novel FS framework based on reinforcement learning (RLFS) is developed to identify the optimal feature subset for the imbalanced data. The RLFS employs an iterative methodology, wherein data resampling technique generates a balanced dataset before each iteration. A decision network is trained using a deep RL algorithm to identify the relevant features for the dataset in the current iteration. With such an iterative training strategy, numerous constructed datasets gradually boost the FS capacity of the decision network, resulting in a robust performance for imbalanced data. Finally, a weighted model is proposed to determine the most suitable FS solution. The RLFS is employed to predict breast cancer iDEs using real follow-up data. The comparison results demonstrated that RLFS effectively reduces the number of features while outperforming several state-of-the-art FS algorithms.
Collapse
|
3
|
Sun Y, Li P, Xu H, Wang R. Structural prior-driven feature extraction with gradient-momentum combined optimization for convolutional neural network image classification. Neural Netw 2024; 179:106511. [PMID: 39146718 DOI: 10.1016/j.neunet.2024.106511] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2024] [Revised: 06/12/2024] [Accepted: 07/03/2024] [Indexed: 08/17/2024]
Abstract
Recent image classification efforts have achieved certain success by incorporating prior information such as labels and logical rules to learn discriminative features. However, these methods overlook the variability of features, resulting in feature inconsistency and fluctuations in model parameter updates, which further contribute to decreased image classification accuracy and model instability. To address this issue, this paper proposes a novel method combining structural prior-driven feature extraction with gradient-momentum (SPGM), from the perspectives of consistent feature learning and precise parameter updates, to enhance the accuracy and stability of image classification. Specifically, SPGM leverages a structural prior-driven feature extraction (SPFE) approach to calculate gradients of multi-level features and original images to construct structural information, which is then transformed into prior knowledge to drive the network to learn features consistent with the original images. Additionally, an optimization strategy integrating gradients and momentum (GMO) is introduced, dynamically adjusting the direction and step size of parameter updates based on the angle and norm of the sum of gradients and momentum, enabling precise model parameter updates. Extensive experiments on CIFAR10 and CIFAR100 datasets demonstrate that the SPGM method significantly reduces the top-1 error rate in image classification, enhances the classification performance, and outperforms state-of-the-art methods.
Collapse
Affiliation(s)
- Yunyun Sun
- School of Internet of Things, Nanjing University of Posts and Telecommunications, Nanjing, 210023, Jiangsu, China.
| | - Peng Li
- School of Computer Science, Nanjing University of Posts and Telecommunications, Nanjing, 210023, Jiangsu, China; Jiangsu High Technology Research Key Laboratory for Wireless Sensor Networks, Nanjing, 210023, Jiangsu, China.
| | - He Xu
- School of Computer Science, Nanjing University of Posts and Telecommunications, Nanjing, 210023, Jiangsu, China; Jiangsu High Technology Research Key Laboratory for Wireless Sensor Networks, Nanjing, 210023, Jiangsu, China.
| | - Ruchuan Wang
- School of Computer Science, Nanjing University of Posts and Telecommunications, Nanjing, 210023, Jiangsu, China; Jiangsu High Technology Research Key Laboratory for Wireless Sensor Networks, Nanjing, 210023, Jiangsu, China.
| |
Collapse
|
4
|
Benny D, Giacobini M, Catalano A, Costa G, Gnavi R, Ricceri F. A Multimorbidity Analysis of Hospitalized Patients With COVID-19 in Northwest Italy: Longitudinal Study Using Evolutionary Machine Learning and Health Administrative Data. JMIR Public Health Surveill 2024; 10:e52353. [PMID: 39024001 PMCID: PMC11294776 DOI: 10.2196/52353] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2023] [Revised: 01/31/2024] [Accepted: 05/16/2024] [Indexed: 07/20/2024] Open
Abstract
BACKGROUND Multimorbidity is a significant public health concern, characterized by the coexistence and interaction of multiple preexisting medical conditions. This complex condition has been associated with an increased risk of COVID-19. Individuals with multimorbidity who contract COVID-19 often face a significant reduction in life expectancy. The postpandemic period has also highlighted an increase in frailty, emphasizing the importance of integrating existing multimorbidity details into epidemiological risk assessments. Managing clinical data that include medical histories presents significant challenges, particularly due to the sparsity of data arising from the rarity of multimorbidity conditions. Also, the complex enumeration of combinatorial multimorbidity features introduces challenges associated with combinatorial explosions. OBJECTIVE This study aims to assess the severity of COVID-19 in individuals with multiple medical conditions, considering their demographic characteristics such as age and sex. We propose an evolutionary machine learning model designed to handle sparsity, analyzing preexisting multimorbidity profiles of patients hospitalized with COVID-19 based on their medical history. Our objective is to identify the optimal set of multimorbidity feature combinations strongly associated with COVID-19 severity. We also apply the Apriori algorithm to these evolutionarily derived predictive feature combinations to identify those with high support. METHODS We used data from 3 administrative sources in Piedmont, Italy, involving 12,793 individuals aged 45-74 years who tested positive for COVID-19 between February and May 2020. From their 5-year pre-COVID-19 medical histories, we extracted multimorbidity features, including drug prescriptions, disease diagnoses, sex, and age. Focusing on COVID-19 hospitalization, we segmented the data into 4 cohorts based on age and sex. Addressing data imbalance through random resampling, we compared various machine learning algorithms to identify the optimal classification model for our evolutionary approach. Using 5-fold cross-validation, we evaluated each model's performance. Our evolutionary algorithm, utilizing a deep learning classifier, generated prediction-based fitness scores to pinpoint multimorbidity combinations associated with COVID-19 hospitalization risk. Eventually, the Apriori algorithm was applied to identify frequent combinations with high support. RESULTS We identified multimorbidity predictors associated with COVID-19 hospitalization, indicating more severe COVID-19 outcomes. Frequently occurring morbidity features in the final evolved combinations were age>53, R03BA (glucocorticoid inhalants), and N03AX (other antiepileptics) in cohort 1; A10BA (biguanide or metformin) and N02BE (anilides) in cohort 2; N02AX (other opioids) and M04AA (preparations inhibiting uric acid production) in cohort 3; and G04CA (Alpha-adrenoreceptor antagonists) in cohort 4. CONCLUSIONS When combined with other multimorbidity features, even less prevalent medical conditions show associations with the outcome. This study provides insights beyond COVID-19, demonstrating how repurposed administrative data can be adapted and contribute to enhanced risk assessment for vulnerable populations.
Collapse
Affiliation(s)
- Dayana Benny
- Centre for Biostatistics, Epidemiology, and Public Health, Department of Clinical and Biological Sciences, University of Turin, Turin, Italy
- Modeling and Data Science, Department of Mathematics, University of Turin, Turin, Italy
| | - Mario Giacobini
- Data Analysis and Modeling Unit, Department of Veterinary Sciences, University of Turin, Turin, Italy
| | - Alberto Catalano
- Centre for Biostatistics, Epidemiology, and Public Health, Department of Clinical and Biological Sciences, University of Turin, Turin, Italy
- Department of Translational Medicine, University of Piemonte Orientale, Novara, Italy
| | - Giuseppe Costa
- Centre for Biostatistics, Epidemiology, and Public Health, Department of Clinical and Biological Sciences, University of Turin, Turin, Italy
| | - Roberto Gnavi
- Unit of Epidemiology, Regional Health Service, Local Health Unit Torino 3, Turin, Italy
| | - Fulvio Ricceri
- Centre for Biostatistics, Epidemiology, and Public Health, Department of Clinical and Biological Sciences, University of Turin, Turin, Italy
| |
Collapse
|
5
|
Chen Z, Ge R, Wang C, Elazab A, Fu X, Min W, Qin F, Jia G, Fan X. Identification of important gene signatures in schizophrenia through feature fusion and genetic algorithm. Mamm Genome 2024; 35:241-255. [PMID: 38512459 DOI: 10.1007/s00335-024-10034-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2023] [Accepted: 02/07/2024] [Indexed: 03/23/2024]
Abstract
Schizophrenia is a debilitating psychiatric disorder that can significantly affect a patient's quality of life and lead to permanent brain damage. Although medical research has identified certain genetic risk factors, the specific pathogenesis of the disorder remains unclear. Despite the prevalence of research employing magnetic resonance imaging, few studies have focused on the gene level and gene expression profile involving a large number of screened genes. However, the high dimensionality of genetic data presents a great challenge to accurately modeling the data. To tackle the current challenges, this study presents a novel feature selection strategy that utilizes heuristic feature fusion and a multi-objective optimization genetic algorithm. The goal is to improve classification performance and identify the key gene subset for schizophrenia diagnostics. Traditional gene screening techniques are inadequate for accurately determining the precise number of key genes associated with schizophrenia. Our innovative approach integrates a filter-based feature selection method to reduce data dimensionality and a multi-objective optimization genetic algorithm for improved classification tasks. By combining the filtering and wrapper methods, our strategy leverages their respective strengths in a deliberate manner, leading to superior classification accuracy and a more efficient selection of relevant genes. This approach has demonstrated significant improvements in classification results across 11 out of 14 relevant datasets. The performance on the remaining three datasets is comparable to the existing methods. Furthermore, visual and enrichment analyses have confirmed the practicality of our proposed method as a promising tool for the early detection of schizophrenia.
Collapse
Affiliation(s)
| | - Ruiquan Ge
- Hangzhou Dianzi University, Hangzhou, China.
- Hangzhou Institute of Advanced Technology, Hangzhou, China.
- Key Laboratory of Discrete Industrial Internet of Things of Zhejiang Province, Hangzhou, China.
| | - Changmiao Wang
- Shenzhen Research Institute of Big Data, Shenzhen, China
| | - Ahmed Elazab
- Computer Science Department, Misr Higher Institute for Commerce and Computers, Mansoura, Egypt
| | - Xianjun Fu
- School of Artificial Intelligence, Zhejiang College of Security Technology, Wenzhou, China
| | - Wenwen Min
- School of Information Science and Engineering, Yunnan University, Kunming, China
| | - Feiwei Qin
- Hangzhou Dianzi University, Hangzhou, China
| | | | - Xiaopeng Fan
- Hangzhou Institute of Advanced Technology, Hangzhou, China
| |
Collapse
|
6
|
Qiu F, Heidari AA, Chen Y, Chen H, Liang G. Advancing forensic-based investigation incorporating slime mould search for gene selection of high-dimensional genetic data. Sci Rep 2024; 14:8599. [PMID: 38615048 PMCID: PMC11016116 DOI: 10.1038/s41598-024-59064-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2024] [Accepted: 04/06/2024] [Indexed: 04/15/2024] Open
Abstract
Modern medicine has produced large genetic datasets of high dimensions through advanced gene sequencing technology, and processing these data is of great significance for clinical decision-making. Gene selection (GS) is an important data preprocessing technique that aims to select a subset of feature information to improve performance and reduce data dimensionality. This study proposes an improved wrapper GS method based on forensic-based investigation (FBI). The method introduces the search mechanism of the slime mould algorithm in the FBI to improve the original FBI; the newly proposed algorithm is named SMA_FBI; then GS is performed by converting the continuous optimizer to a binary version of the optimizer through a transfer function. In order to verify the superiority of SMA_FBI, experiments are first executed on the 30-function test set of CEC2017 and compared with 10 original algorithms and 10 state-of-the-art algorithms. The experimental results show that SMA_FBI is better than other algorithms in terms of finding the optimal solution, convergence speed, and robustness. In addition, BSMA_FBI (binary version of SMA_FBI) is compared with 8 binary algorithms on 18 high-dimensional genetic data from the UCI repository. The results indicate that BSMA_FBI is able to obtain high classification accuracy with fewer features selected in GS applications. Therefore, SMA_FBI is considered an optimization tool with great potential for dealing with global optimization problems, and its binary version, BSMA_FBI, can be used for GS tasks.
Collapse
Affiliation(s)
- Feng Qiu
- Institute of Big Data and Information Technology, Wenzhou University, Wenzhou, 325035, China
| | - Ali Asghar Heidari
- School of Surveying and Geospatial Engineering, College of Engineering, University of Tehran, Tehran, Iran
| | - Yi Chen
- Department of Computer Science and Artificial Intelligence, Wenzhou University, Wenzhou, 325035, China
| | - Huiling Chen
- Institute of Big Data and Information Technology, Wenzhou University, Wenzhou, 325035, China.
| | - Guoxi Liang
- Department of Artificial Intelligence, Wenzhou Polytechnic, Wenzhou, 325035, China.
| |
Collapse
|
7
|
Li M, Cao R, Zhao Y, Li Y, Deng S. Population characteristic exploitation-based multi-orientation multi-objective gene selection for microarray data classification. Comput Biol Med 2024; 170:108089. [PMID: 38330824 DOI: 10.1016/j.compbiomed.2024.108089] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2023] [Revised: 01/23/2024] [Accepted: 01/27/2024] [Indexed: 02/10/2024]
Abstract
Gene selection is a process of selecting discriminative genes from microarray data that helps to diagnose and classify cancer samples effectively. Swarm intelligence evolution-based gene selection algorithms can never circumvent the problem that the population is prone to local optima in the process of gene selection. To tackle this challenge, previous research has focused primarily on two aspects: mitigating premature convergence to local optima and escaping from local optima. In contrast to these strategies, this paper introduces a novel perspective by adopting reverse thinking, where the issue of local optima is seen as an opportunity rather than an obstacle. Building on this foundation, we propose MOMOGS-PCE, a novel gene selection approach that effectively exploits the advantageous characteristics of populations trapped in local optima to uncover global optimal solutions. Specifically, MOMOGS-PCE employs a novel population initialization strategy, which involves the initialization of multiple populations that explore diverse orientations to foster distinct population characteristics. The subsequent step involved the utilization of an enhanced NSGA-II algorithm to amplify the advantageous characteristics exhibited by the population. Finally, a novel exchange strategy is proposed to facilitate the transfer of characteristics between populations that have reached near maturity in evolution, thereby promoting further population evolution and enhancing the search for more optimal gene subsets. The experimental results demonstrated that MOMOGS-PCE exhibited significant advantages in comprehensive indicators compared with six competitive multi-objective gene selection algorithms. It is confirmed that the "reverse-thinking" approach not only avoids local optima but also leverages it to uncover superior gene subsets for cancer diagnosis.
Collapse
Affiliation(s)
- Min Li
- School of Information Engineering, Nanchang Institute of Technology, No. 289 Tianxiang Road, Nanchang, Jiangxi, PR China.
| | - Rutun Cao
- School of Information Engineering, Nanchang Institute of Technology, No. 289 Tianxiang Road, Nanchang, Jiangxi, PR China
| | - Yangfan Zhao
- School of Information Engineering, Nanchang Institute of Technology, No. 289 Tianxiang Road, Nanchang, Jiangxi, PR China
| | - Yulong Li
- School of Information Engineering, Nanchang Institute of Technology, No. 289 Tianxiang Road, Nanchang, Jiangxi, PR China
| | - Shaobo Deng
- School of Information Engineering, Nanchang Institute of Technology, No. 289 Tianxiang Road, Nanchang, Jiangxi, PR China
| |
Collapse
|
8
|
Jiao R, Xue B, Zhang M. Benefiting From Single-Objective Feature Selection to Multiobjective Feature Selection: A Multiform Approach. IEEE TRANSACTIONS ON CYBERNETICS 2023; 53:7773-7786. [PMID: 36346857 DOI: 10.1109/tcyb.2022.3218345] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
Evolutionary multiobjective feature selection (FS) has gained increasing attention in recent years. However, it still faces some challenges, for example, the frequently appeared duplicated solutions in either the search space or the objective space lead to the diversity loss of the population, and the huge search space results in the low search efficiency of the algorithm. Minimizing the number of selected features and maximizing the classification performance are two major objectives in FS. Usually, the fitness function of a single-objective FS problem linearly aggregates these two objectives through a weighted sum method. Given a predefined direction (weight) vector, the single-objective FS task can explore the specified direction or area extensively. Different direction vectors result in different search directions in the objective space. Motivated by this, this article proposes a multiform framework, which solves a multiobjective FS task combined with its auxiliary single-objective FS tasks in a multitask environment. By setting different direction vectors, promising feature subsets from single-objective FS tasks can be utilized, to boost the evolutionary search of the multiobjective FS task. By comparing with five classical and state-of-the-art multiobjective evolutionary algorithms, as well as four well-performing FS algorithms, the effectiveness and efficiency of the proposed method are verified via extensive experiments on 18 classification datasets. Furthermore, the effectiveness of the proposed method is also investigated in a noisy environment.
Collapse
|
9
|
Zhang F, Mei Y, Nguyen S, Zhang M. Multitask Multiobjective Genetic Programming for Automated Scheduling Heuristic Learning in Dynamic Flexible Job-Shop Scheduling. IEEE TRANSACTIONS ON CYBERNETICS 2023; 53:4473-4486. [PMID: 36018866 DOI: 10.1109/tcyb.2022.3196887] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Evolutionary multitask multiobjective learning has been widely used for handling more than one multiobjective task simultaneously. However, it is rarely used in dynamic combinatorial optimization problems, which have valuable practical applications such as dynamic flexible job-shop scheduling (DFJSS) in manufacturing. Genetic programming (GP), as a popular hyperheuristic approach, has been used to learn scheduling heuristics for generating schedules for multitask single-objective DFJSS only. Searching in the heuristic space with GP is more difficult than in the solution space, since a small change on heuristics can lead to ineffective or even infeasible solutions. Multiobjective DFJSS is more challenging than single DFJSS, since a scheduling heuristic needs to cope with multiple objectives. To tackle this challenge, we first propose a multipopulation-based multitask multiobjective GP algorithm to preserve the quality of the learned scheduling heuristics for each task. Furthermore, we develop a multitask multiobjective GP algorithm with a task-oriented knowledge-sharing strategy to further improve the effectiveness of learning scheduling heuristics for DFJSS. The results show that the designed multipopulation-based GP algorithms, especially the one with the task-oriented knowledge-sharing strategy, can achieve good performance for all the examined tasks by maintaining the quality and diversity of individuals for corresponding tasks well. The learned Pareto fronts also show that the GP algorithm with task-oriented knowledge-sharing strategy can learn competitive scheduling heuristics for DFJSS on both of the objectives.
Collapse
|
10
|
Wang X, Kang Q, Zhou M, Yao S, Abusorrah A. Domain Adaptation Multitask Optimization. IEEE TRANSACTIONS ON CYBERNETICS 2023; 53:4567-4578. [PMID: 36445998 DOI: 10.1109/tcyb.2022.3222101] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
Multitask optimization (MTO) is a new optimization paradigm that leverages useful information contained in multiple tasks to help solve each other. It attracts increasing attention in recent years and gains significant performance improvements. However, the solutions of distinct tasks usually obey different distributions. To avoid that individuals after intertask learning are not suitable for the original task due to the distribution differences and even impede overall solution efficiency, we propose a novel multitask evolutionary framework that enables knowledge aggregation and online learning among distinct tasks to solve MTO problems. Our proposal designs a domain adaptation-based mapping strategy to reduce the difference across solution domains and find more genetic traits to improve the effectiveness of information interactions. To further improve the algorithm performance, we propose a smart way to divide initial population into different subpopulations and choose suitable individuals to learn. By ranking individuals in target subpopulation, worse-performing individuals can learn from other tasks. The significant advantage of our proposed paradigm over the state of the art is verified via a series of MTO benchmark studies.
Collapse
|
11
|
Liu D, Zhang X, Zhang Z, Jiang H. A Hybrid Feature Selection and Multi-Label Driven Intelligent Fault Diagnosis Method for Gearbox. SENSORS (BASEL, SWITZERLAND) 2023; 23:4792. [PMID: 37430707 DOI: 10.3390/s23104792] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/25/2023] [Revised: 05/11/2023] [Accepted: 05/13/2023] [Indexed: 07/12/2023]
Abstract
Gearboxes are utilized in practically all complicated machinery equipment because they have great transmission accuracy and load capacities, so their failure frequently results in significant financial losses. The classification of high-dimensional data remains a difficult topic despite the fact that numerous data-driven intelligent diagnosis approaches have been suggested and employed for compound fault diagnosis in recent years with successful outcomes. In order to achieve the best diagnostic performance as the ultimate objective, a feature selection and fault decoupling framework is proposed in this paper. That is based on multi-label K-nearest neighbors (ML-kNN) as classifiers and can automatically determine the optimal subset from the original high-dimensional feature set. The proposed feature selection method is a hybrid framework that can be divided into three stages. The Fisher score, information gain, and Pearson's correlation coefficient are three filter models that are used in the first stage to pre-rank candidate features. In the second stage, a weighting scheme based on the weighted average method is proposed to fuse the pre-ranking results obtained in the first stage and optimize the weights using a genetic algorithm to re-rank the features. The optimal subset is automatically and iteratively found in the third stage using three heuristic strategies, including binary search, sequential forward search, and sequential backward search. The method takes into account the consideration of feature irrelevance, redundancy and inter-feature interaction in the selection process, and the selected optimal subsets have better diagnostic performance. In two gearbox compound fault datasets, ML-kNN performs exceptionally well using the optimal subset with subset accuracy of 96.22% and 100%. The experimental findings demonstrate the effectiveness of the proposed method in predicting various labels for compound fault samples to identify and decouple compound faults. The proposed method performs better in terms of classification accuracy and optimal subset dimensionality when compared to other existing methods.
Collapse
Affiliation(s)
- Di Liu
- College of Intelligent Manufacturing and Industrial Modernization, Xinjiang University, Urumchi 830017, China
| | - Xiangfeng Zhang
- College of Intelligent Manufacturing and Industrial Modernization, Xinjiang University, Urumchi 830017, China
| | - Zhiyu Zhang
- College of Intelligent Manufacturing and Industrial Modernization, Xinjiang University, Urumchi 830017, China
| | - Hong Jiang
- College of Intelligent Manufacturing and Industrial Modernization, Xinjiang University, Urumchi 830017, China
| |
Collapse
|
12
|
Cheng F, Zhang C, Zhang X. An Evolutionary Multitasking Method for Multiclass Classification. IEEE COMPUT INTELL M 2022. [DOI: 10.1109/mci.2022.3199625] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
|
13
|
Li Q, Wang P, Yuan J, Zhou Y, Mei Y, Ye M. A two-stage hybrid gene selection algorithm combined with machine learning models to predict the rupture status in intracranial aneurysms. Front Neurosci 2022; 16:1034971. [PMID: 36340761 PMCID: PMC9631203 DOI: 10.3389/fnins.2022.1034971] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2022] [Accepted: 09/30/2022] [Indexed: 07/31/2023] Open
Abstract
An IA is an abnormal swelling of cerebral vessels, and a subset of these IAs can rupture causing aneurysmal subarachnoid hemorrhage (aSAH), often resulting in death or severe disability. Few studies have used an appropriate method of feature selection combined with machine learning by analyzing transcriptomic sequencing data to identify new molecular biomarkers. Following gene ontology (GO) and enrichment analysis, we found that the distinct status of IAs could lead to differential innate immune responses using all 913 differentially expressed genes, and considering that there are numerous irrelevant and redundant genes, we propose a mixed filter- and wrapper-based feature selection. First, we used the Fast Correlation-Based Filter (FCBF) algorithm to filter a large number of irrelevant and redundant genes in the raw dataset, and then used the wrapper feature selection method based on the he Multi-layer Perceptron (MLP) neural network and the Particle Swarm Optimization (PSO), accuracy (ACC) and mean square error (MSE) were then used as the evaluation criteria. Finally, we constructed a novel 10-gene signature (YIPF1, RAB32, WDR62, ANPEP, LRRCC1, AADAC, GZMK, WBP2NL, PBX1, and TOR1B) by the proposed two-stage hybrid algorithm FCBF-MLP-PSO and used different machine learning models to predict the rupture status in IAs. The highest ACC value increased from 0.817 to 0.919 (12.5% increase), the highest area under ROC curve (AUC) value increased from 0.87 to 0.94 (8.0% increase), and all evaluation metrics improved by approximately 10% after being processed by our proposed gene selection algorithm. Therefore, these 10 informative genes used to predict rupture status of IAs can be used as complements to imaging examinations in the clinic, meanwhile, this selected gene signature also provides new targets and approaches for the treatment of ruptured IAs.
Collapse
Affiliation(s)
- Qingqing Li
- School of Medical Information, Wannan Medical College, Wuhu, Anhui, China
- Research Center of Health Big Data Mining and Applications, Wannan Medical College, Wuhu, Anhui, China
| | - Peipei Wang
- School of Medical Information, Wannan Medical College, Wuhu, Anhui, China
- Research Center of Health Big Data Mining and Applications, Wannan Medical College, Wuhu, Anhui, China
| | - Jinlong Yuan
- Department of Neurosurgery, Yijishan Hospital of Wannan Medical College, Wannan Medical College, Wuhu, Anhui, China
| | - Yunfeng Zhou
- Department of Radiology, Yijishan Hospital of Wannan Medical College, Wannan Medical College, Wuhu, Anhui, China
| | - Yaxin Mei
- School of Medical Information, Wannan Medical College, Wuhu, Anhui, China
- Research Center of Health Big Data Mining and Applications, Wannan Medical College, Wuhu, Anhui, China
| | - Mingquan Ye
- School of Medical Information, Wannan Medical College, Wuhu, Anhui, China
- Research Center of Health Big Data Mining and Applications, Wannan Medical College, Wuhu, Anhui, China
| |
Collapse
|
14
|
An evolutionary multitasking optimization algorithm via reference-point based nondominated sorting approach. EVOLUTIONARY INTELLIGENCE 2022. [DOI: 10.1007/s12065-022-00788-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/17/2022]
|
15
|
An Efficient Hybrid Feature Selection Method Using the Artificial Immune Algorithm for High-Dimensional Data. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2022; 2022:1452301. [PMID: 36275946 PMCID: PMC9584659 DOI: 10.1155/2022/1452301] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/11/2022] [Revised: 07/31/2022] [Accepted: 08/29/2022] [Indexed: 12/02/2022]
Abstract
Feature selection provides the optimal subset of features for data mining models. However, current feature selection methods for high-dimensional data also require a better balance between feature subset quality and computational cost. In this paper, an efficient hybrid feature selection method (HFIA) based on artificial immune algorithm optimization is proposed to solve the feature selection problem of high-dimensional data. The algorithm combines filter algorithms and improves clone selection algorithms to explore the feature space of high-dimensional data. According to the target requirements of feature selection, combined with biological research results, this method introduces the lethal mutation mechanism and the Cauchy operator to improve the search performance of the algorithm. Moreover, the adaptive adjustment factor is introduced in the mutation and update phases of the algorithm. The effective combination of these mechanisms enables the algorithm to obtain a better search ability and lower computational costs. Experimental comparisons with 19 state-of-the-art feature selection methods are conducted on 25 high-dimensional benchmark datasets. The results show that the feature reduction rate for all datasets is above 99%, and the performance improvement for the classifier is between 5% and 48.33%. Compared with the five classical filtering feature selection methods, the computational cost of HFIA is lower than the two of them, and it is far better than these five algorithms in terms of the feature reduction rate and classification accuracy improvement. Compared with the 14 hybrid feature selection methods reported in the latest literature, the average winning rates in terms of classification accuracy, feature reduction rate, and computational cost are 85.83%, 88.33%, and 96.67%, respectively.
Collapse
|
16
|
Compressed-Encoding Particle Swarm Optimization with Fuzzy Learning for Large-Scale Feature Selection. Symmetry (Basel) 2022. [DOI: 10.3390/sym14061142] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022] Open
Abstract
Particle swarm optimization (PSO) is a promising method for feature selection. When using PSO to solve the feature selection problem, the probability of each feature being selected and not being selected is the same in the beginning and is optimized during the evolutionary process. That is, the feature selection probability is optimized from symmetry (i.e., 50% vs. 50%) to asymmetry (i.e., some are selected with a higher probability, and some with a lower probability) to help particles obtain the optimal feature subset. However, when dealing with large-scale features, PSO still faces the challenges of a poor search performance and a long running time. In addition, a suitable representation for particles to deal with the discrete binary optimization problem of feature selection is still in great need. This paper proposes a compressed-encoding PSO with fuzzy learning (CEPSO-FL) for the large-scale feature selection problem. It uses the N-base encoding method for the representation of particles and designs a particle update mechanism based on the Hamming distance and a fuzzy learning strategy, which can be performed in the discrete space. It also proposes a local search strategy to dynamically skip some dimensions when updating particles, thus reducing the search space and reducing the running time. The experimental results show that CEPSO-FL performs well for large-scale feature selection problems. The solutions obtained by CEPSO-FL contain small feature subsets and have an excellent performance in classification problems.
Collapse
|
17
|
Gupta A, Zhou L, Ong YS, Chen Z, Hou Y. Half a Dozen Real-World Applications of Evolutionary Multitasking, and More. IEEE COMPUT INTELL M 2022. [DOI: 10.1109/mci.2022.3155332] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
|
18
|
Wang Y, Gao X, Ru X, Sun P, Wang J. A hybrid feature selection algorithm and its application in bioinformatics. PeerJ Comput Sci 2022; 8:e933. [PMID: 35494789 PMCID: PMC9044222 DOI: 10.7717/peerj-cs.933] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2022] [Accepted: 03/03/2022] [Indexed: 06/14/2023]
Abstract
Feature selection is an independent technology for high-dimensional datasets that has been widely applied in a variety of fields. With the vast expansion of information, such as bioinformatics data, there has been an urgent need to investigate more effective and accurate methods involving feature selection in recent decades. Here, we proposed the hybrid MMPSO method, by combining the feature ranking method and the heuristic search method, to obtain an optimal subset that can be used for higher classification accuracy. In this study, ten datasets obtained from the UCI Machine Learning Repository were analyzed to demonstrate the superiority of our method. The MMPSO algorithm outperformed other algorithms in terms of classification accuracy while utilizing the same number of features. Then we applied the method to a biological dataset containing gene expression information about liver hepatocellular carcinoma (LIHC) samples obtained from The Cancer Genome Atlas (TCGA) and Genotype-Tissue Expression (GTEx). On the basis of the MMPSO algorithm, we identified a 18-gene signature that performed well in distinguishing normal samples from tumours. Nine of the 18 differentially expressed genes were significantly up-regulated in LIHC tumour samples, and the area under curves (AUC) of the combination seven genes (ADRA2B, ERAP2, NPC1L1, PLVAP, POMC, PYROXD2, TRIM29) in classifying tumours with normal samples was greater than 0.99. Six genes (ADRA2B, PYROXD2, CACHD1, FKBP1B, PRKD1 and RPL7AP6) were significantly correlated with survival time. The MMPSO algorithm can be used to effectively extract features from a high-dimensional dataset, which will provide new clues for identifying biomarkers or therapeutic targets from biological data and more perspectives in tumor research.
Collapse
Affiliation(s)
- Yangyang Wang
- School of Electronics and Information, Northwestern Polytechnical University, Xi’an, Shaanxi, China
| | - Xiaoguang Gao
- School of Electronics and Information, Northwestern Polytechnical University, Xi’an, Shaanxi, China
| | - Xinxin Ru
- School of Electronics and Information, Northwestern Polytechnical University, Xi’an, Shaanxi, China
| | - Pengzhan Sun
- School of Electronics and Information, Northwestern Polytechnical University, Xi’an, Shaanxi, China
| | - Jihan Wang
- Institute of Medical Research, Northwestern Polytechnical University, Xi’an, Shaanxi, China
| |
Collapse
|
19
|
Liu J, Anavatti S, Garratt M, Tan KC, Abbass HA. A survey, taxonomy and progress evaluation of three decades of swarm optimisation. Artif Intell Rev 2021. [DOI: 10.1007/s10462-021-10095-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|