1
|
Gao C, Zhou J, Wang X, Pedrycz W. Granule Margin-Based Feature Selection in Weighted Neighborhood Systems. IEEE TRANSACTIONS ON CYBERNETICS 2025; 55:2151-2164. [PMID: 40072867 DOI: 10.1109/tcyb.2025.3544693] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/14/2025]
Abstract
Neighborhood rough sets are an effective model for handling numerical and categorical data entangled with vagueness, imprecision, or uncertainty. However, existing neighborhood rough set models and their feature selection methods treat each sample equally, whereas different types of samples inherently play different roles in constructing neighborhood granules and evaluating the goodness of features. In this study, the sample weight information is first introduced into neighborhood rough sets, and a novel weighted neighborhood rough set model is consequently constructed. Then, considering the lack of sample weight information in practical data, a margin-based weight optimization function is designed, based on which a gradient descent algorithm is provided to adaptively learn sample weights through maximizing sample margins. Finally, an average granule margin measure is put forward for feature selection, and a forward-adding heuristic algorithm is developed to generate an optimal feature subset. The proposed method constructs the weighted neighborhood rough sets using sample weights for the first time and is able to yield compact feature subsets with a large margin. Extensive experiments and statistical analysis on UCI datasets show that the proposed method achieves highly competitive performance in terms of feature reduction rate and classification accuracy when compared with other state-of-the-art methods.
Collapse
|
2
|
Zhang K, Liang W, Cao P, Mao Z, Yang J, Zaiane OR. CorLabelNet: a comprehensive framework for multi-label chest X-ray image classification with correlation guided discriminant feature learning and oversampling. Med Biol Eng Comput 2025; 63:1045-1058. [PMID: 39609353 DOI: 10.1007/s11517-024-03247-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2024] [Accepted: 11/12/2024] [Indexed: 11/30/2024]
Abstract
Recent advancements in deep learning techniques have significantly improved multi-label chest X-ray (CXR) image classification for clinical diagnosis. However, most previous studies neither effectively learn label correlations nor take full advantage of them to improve multi-label classification performance. In addition, different labels of CXR images are usually severely imbalanced, resulting in the model exhibiting a bias towards the majority class. To address these challenges, we introduce a framework that not only learns label correlations but also utilizes them to guide the learning of features and the process of oversampling. In this paper, our approach incorporates self-attention to capture high-order label correlations and considers label correlations from both global and local perspectives. Then, we propose a consistency constraint and a multi-label contrastive loss to enhance feature learning. To alleviate the imbalance issue, we further propose an oversampling approach that exploits the learned label correlation to identify crucial seed samples for oversampling. Our approach repeats 5-fold cross-validation process experiments three times and achieves the best performance on both the CheXpert and ChestX-Ray14 datasets. Learning accurate label correlation is significant for multi-label classification and taking full advantage of label correlations is beneficial for discriminative feature learning and oversampling. A comparative analysis with the state-of-the-art approaches highlights the effectiveness of our proposed methods.
Collapse
Affiliation(s)
- Kai Zhang
- Computer Science and Engineering, Northeastern University, Shenyang, China
- Key Laboratory of Intelligent Computing in Medical Image of Ministry of Education, Northeastern University, Shenyang, China
| | - Wei Liang
- Computer Science and Engineering, Northeastern University, Shenyang, China
- Key Laboratory of Intelligent Computing in Medical Image of Ministry of Education, Northeastern University, Shenyang, China
| | - Peng Cao
- Computer Science and Engineering, Northeastern University, Shenyang, China.
- Key Laboratory of Intelligent Computing in Medical Image of Ministry of Education, Northeastern University, Shenyang, China.
| | - Zhaoyang Mao
- Computer Science and Engineering, Northeastern University, Shenyang, China
- Key Laboratory of Intelligent Computing in Medical Image of Ministry of Education, Northeastern University, Shenyang, China
| | - Jinzhu Yang
- Computer Science and Engineering, Northeastern University, Shenyang, China
- Key Laboratory of Intelligent Computing in Medical Image of Ministry of Education, Northeastern University, Shenyang, China
| | - Osmar R Zaiane
- Alberta Machine Intelligence Institute, University of Alberta, Edmonton, Canada
| |
Collapse
|
3
|
Li G, Yu Z, Yang K, Chen CLP, Li X. Ensemble-Enhanced Semi-Supervised Learning With Optimized Graph Construction for High-Dimensional Data. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2025; 47:1103-1119. [PMID: 39446542 DOI: 10.1109/tpami.2024.3486319] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/26/2024]
Abstract
Graph-based methods have demonstrated exceptional performance in semi-supervised classification. However, existing graph-based methods typically construct either a predefined graph in the original space or an adaptive graph within the output space, which often limits their ability to fully utilize prior information and capture the optimal intrinsic data distribution, particularly in high-dimensional data with abundant redundant and noisy features. This paper introduces a novel approach: Semi-Supervised Classification with Optimized Graph Construction (SSC-OGC). SSC-OGC leverages both predefined and adaptive graphs to explore intrinsic data distribution and effectively employ prior information. Additionally, a graph constraint regularization term (GCR) and a collaborative constraint regularization term (CCR) are incorporated to further enhance the quality of the adaptive graph structure and the learned subspace, respectively. To eliminate the negative effect of constructing a predefined graph in the original data space, we further propose a Hybrid Subspace Ensemble-enhanced framework based on the proposed Optimized Graph Construction method (HSE-OGC). Specifically, we construct multiple hybrid subspaces, which consist of meticulously chosen features from the original data to achieve high-quality and diverse space representations. Then, HSE-OGC constructs multiple predefined graphs within hybrid subspaces and trains multiple SSC-OGC classifiers to complement each other, significantly improving the overall performance. Experimental results conducted on various high-dimensional datasets demonstrate that HSE-OGC exhibits outstanding performance.
Collapse
|
4
|
Sharma Y, Singh BK, Dhurandhar S. Vocal tasks-based EEG and speech signal analysis in children with neurodevelopmental disorders: a multimodal investigation. Cogn Neurodyn 2024; 18:2387-2403. [PMID: 39555290 PMCID: PMC11564584 DOI: 10.1007/s11571-024-10096-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2023] [Revised: 02/06/2024] [Accepted: 02/24/2024] [Indexed: 11/19/2024] Open
Abstract
Neurodevelopmental disorders (NDs) often hamper multiple functional prints of a child brain. Despite several studies on their neural and speech responses, multimodal researches on NDs are extremely rare. The present work examined the electroencephalography (EEG) and speech signals of the ND and control children, who performed "Hindi language" vocal tasks (V) of seven different categories, viz. 'vowel', 'consonant', 'one syllable', 'multi-syllable', 'compound', 'complex', and 'sentence' (V1-V7). Statistical testing of EEG parameters showed substantially high beta and gamma band energies in frontal, central, and temporal head sites of NDs for tasks V1-V5 and in parietal too for V6. For the 'sentence' task (V7), the NDs yielded significantly high theta and low alpha energies in the parietal area. These findings imply that even performing a general context-based task exerts a heavy cognitive loading in neurodevelopmental subjects. They also exhibited poor auditory comprehension while executing a long phrasing. Further, the speech signal analysis manifested significantly high amplitude (for V1-V7) and frequency (for V3-V7) perturbations in the voices of ND children. Moreover, the classification of subjects as ND or control was done via EEG and speech features. We attained 100% accuracy, precision, and F-measure using EEG features of all tasks, and using speech features of the 'complex' task. Jointly, the 'complex' task transpired as the best vocal stimuli among V1-V7 for characterizing ND brains. Meanwhile, we also inspected inter-relations between EEG energies and speech attributes of the ND group. Our work, thus, represents a unique multimodal layout to explore the distinctiveness of neuro-impaired children.
Collapse
Affiliation(s)
- Yogesh Sharma
- Department of Biomedical Engineering, National Institute of Technology Raipur, Raipur, Chhattisgarh 492010 India
| | - Bikesh Kumar Singh
- Department of Biomedical Engineering, National Institute of Technology Raipur, Raipur, Chhattisgarh 492010 India
| | | |
Collapse
|
5
|
Liu Z, Si L, Shi S, Li J, Zhu J, Lee WH, Lo SL, Yan X, Chen B, Fu F, Zheng Y, Wang G. Classification of Three Anesthesia Stages Based on Near-Infrared Spectroscopy Signals. IEEE J Biomed Health Inform 2024; 28:5270-5279. [PMID: 38833406 DOI: 10.1109/jbhi.2024.3409163] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/06/2024]
Abstract
Proper monitoring of anesthesia stages can guarantee the safe performance of clinical surgeries. In this study, different anesthesia stages were classified using near-infrared spectroscopy (NIRS) signals with machine learning. The cerebral hemodynamic variables of right proximal oxyhemoglobin (HbO2) in maintenance (MNT), emergence (EM) and the consciousness (CON) stage were collected and then the differences between the three stages were compared by phase-amplitude coupling (PAC). Then combined with time-domain including linear (mean, standard deviation, max, min and range), nonlinear (sample entropy) and power in frequency-domain signal features, feature selection was performed and finally classification was performed by support vector machine (SVM) classifier. The results show that the PAC of the NIRS signal was gradually enhanced with the deepening of anesthesia level. A good three-classification accuracy of 69.27% was obtained, which exceeded the result of classification of any single category feature. These results indicate the feasibility of NIRS signals in performing three or even more anesthesia stage classifications, providing insight into the development of new anesthesia monitoring modalities.
Collapse
|
6
|
Cai Z, Li Z, Chen Z, Zhuo H, Zheng L, Wu X, Liu Y. Device-Free Wireless Sensing for Gesture Recognition Based on Complementary CSI Amplitude and Phase. SENSORS (BASEL, SWITZERLAND) 2024; 24:3414. [PMID: 38894205 PMCID: PMC11175107 DOI: 10.3390/s24113414] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/07/2024] [Revised: 05/17/2024] [Accepted: 05/22/2024] [Indexed: 06/21/2024]
Abstract
By integrating sensing capability into wireless communication, wireless sensing technology has become a promising contactless and non-line-of-sight sensing paradigm to explore the dynamic characteristics of channel state information (CSI) for recognizing human behaviors. In this paper, we develop an effective device-free human gesture recognition (HGR) system based on WiFi wireless sensing technology in which the complementary CSI amplitude and phase of communication link are jointly exploited. To improve the quality of collected CSI, a linear transform-based data processing method is first used to eliminate the phase offset and noise and to reduce the impact of multi-path effects. Then, six different time and frequency domain features are chosen for both amplitude and phase, including the mean, variance, root mean square, interquartile range, energy entropy and power spectral entropy, and a feature selection algorithm to remove irrelevant and redundant features is proposed based on filtering and principal component analysis methods, resulting in the construction of a feature subspace to distinguish different gestures. On this basis, a support vector machine-based stacking algorithm is proposed for gesture classification based on the selected and complementary amplitude and phase features. Lastly, we conduct experiments under a practical scenario with one transmitter and receiver. The results demonstrate that the average accuracy of the proposed HGR system is 98.3% and that the F1-score is over 97%.
Collapse
Affiliation(s)
- Zhijia Cai
- School of Electronics and Communication Engineering, Guangzhou University, Guangzhou 510006, China
- School of Information and Optoelectronic Science and Engineering, South China Normal University, Guangzhou 510006, China
| | - Zehao Li
- School of Information and Optoelectronic Science and Engineering, South China Normal University, Guangzhou 510006, China
| | - Zikai Chen
- School of Information and Optoelectronic Science and Engineering, South China Normal University, Guangzhou 510006, China
| | - Hongyang Zhuo
- School of Physics and Telecommunication Engineering, South China Normal University, Guangzhou 510006, China
| | - Lei Zheng
- School of Electronics and Communication Engineering, Guangzhou University, Guangzhou 510006, China
| | - Xianda Wu
- School of Electronics and Information Engineering, South China Normal University, Foshan 528225, China (Y.L.)
| | - Yong Liu
- School of Electronics and Information Engineering, South China Normal University, Foshan 528225, China (Y.L.)
| |
Collapse
|
7
|
Li ZH, Wang RL, Lu M, Wang X, Huang YP, Yang JW, Zhang TY. A novel method for identifying aerobic granular sludge state using sorting, densification and clarification dynamics during the settling process. WATER RESEARCH 2024; 253:121336. [PMID: 38382291 DOI: 10.1016/j.watres.2024.121336] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/24/2023] [Revised: 01/22/2024] [Accepted: 02/17/2024] [Indexed: 02/23/2024]
Abstract
Aerobic granular sludge is one of the most promising biological wastewater treatment technologies, yet maintaining its stability is still a challenge for its application, and predicting the state of the granules is essential in addressing this issue. This study explored the potential of dynamic texture entropy, derived from settling images, as a predictive tool for the state of granular sludge. Three processes, traditional thickening, often overlooked clarification, and innovative particle sorting, were used to capture the complexity and diversity of granules. It was found that rapid sorting during settling indicates stable granules, which helps to identify the state of granules. Furthermore, a relationship between sorting time and granule heterogeneity was identified, helping to adjust selection pressure. Features of the dynamic texture entropy well correlated with the respirogram, i.e., R2 were 0.86 and 0.91 for the specific endogenous respiration rate (SOURe) and the specific quasi-endogenous respiration rate (SOURq), respectively, providing a biologically based approach for monitoring the state of granules. The classification accuracy of models using features of dynamic texture entropy as an input was greater than 0.90, significantly higher than the input of conventional features, demonstrating the significant advantage of this approach. These findings contributed to developing robust monitoring tools that facilitate the maintenance of stable granular sludge operations.
Collapse
Affiliation(s)
- Zhi-Hua Li
- Key Laboratory of Northwest Water Resource, Environment, and Ecology, MOE, School of Environmental and Municipal Engineering, Xi'an University of Architecture and Technology, Xi'an 710055, China; Xi'an Key Laboratory of Intelligent Equipment Technology for Environmental Engineering, Xi'an University of Architecture and Technology, Xi'an 710055, China.
| | - Ruo-Lan Wang
- Key Laboratory of Northwest Water Resource, Environment, and Ecology, MOE, School of Environmental and Municipal Engineering, Xi'an University of Architecture and Technology, Xi'an 710055, China; Xi'an Key Laboratory of Intelligent Equipment Technology for Environmental Engineering, Xi'an University of Architecture and Technology, Xi'an 710055, China
| | - Meng Lu
- Key Laboratory of Northwest Water Resource, Environment, and Ecology, MOE, School of Environmental and Municipal Engineering, Xi'an University of Architecture and Technology, Xi'an 710055, China; Xi'an Key Laboratory of Intelligent Equipment Technology for Environmental Engineering, Xi'an University of Architecture and Technology, Xi'an 710055, China
| | - Xin Wang
- Key Laboratory of Northwest Water Resource, Environment, and Ecology, MOE, School of Environmental and Municipal Engineering, Xi'an University of Architecture and Technology, Xi'an 710055, China; Xi'an Key Laboratory of Intelligent Equipment Technology for Environmental Engineering, Xi'an University of Architecture and Technology, Xi'an 710055, China
| | - Yong-Peng Huang
- Key Laboratory of Northwest Water Resource, Environment, and Ecology, MOE, School of Environmental and Municipal Engineering, Xi'an University of Architecture and Technology, Xi'an 710055, China; Xi'an Key Laboratory of Intelligent Equipment Technology for Environmental Engineering, Xi'an University of Architecture and Technology, Xi'an 710055, China
| | - Jia-Wei Yang
- Key Laboratory of Northwest Water Resource, Environment, and Ecology, MOE, School of Environmental and Municipal Engineering, Xi'an University of Architecture and Technology, Xi'an 710055, China; Xi'an Key Laboratory of Intelligent Equipment Technology for Environmental Engineering, Xi'an University of Architecture and Technology, Xi'an 710055, China
| | - Tian-Yu Zhang
- Department of Mathematical Sciences, Montana State University, Bozeman, MT 59717, USA
| |
Collapse
|
8
|
Zhang K, Liang W, Cao P, Liu X, Yang J, Zaiane O. Label correlation guided discriminative label feature learning for multi-label chest image classification. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2024; 245:108032. [PMID: 38244339 DOI: 10.1016/j.cmpb.2024.108032] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/12/2023] [Revised: 01/02/2024] [Accepted: 01/12/2024] [Indexed: 01/22/2024]
Abstract
BACKGROUND AND OBJECTIVE Multi-label Chest X-ray (CXR) images often contain rich label relationship information, which is beneficial to improve classification performance. However, because of the intricate relationships among labels, most existing works fail to effectively learn and make full use of the label correlations, resulting in limited classification performance. In this study, we propose a multi-label learning framework that learns and leverages the label correlations to improve multi-label CXR image classification. METHODS In this paper, we capture the global label correlations through the self-attention mechanism. Meanwhile, to better utilize label correlations for guiding feature learning, we decompose the image-level features into label-level features. Furthermore, we enhance label-level feature learning in an end-to-end manner by a consistency constraint between global and local label correlations, and a label correlation guided multi-label supervised contrastive loss. RESULTS To demonstrate the superior performance of our proposed approach, we conduct three times 5-fold cross-validation experiments on the CheXpert dataset. Our approach obtains an average F1 score of 44.6% and an AUC of 76.5%, achieving a 7.7% and 1.3% improvement compared to the state-of-the-art results. CONCLUSION More accurate label correlations and full utilization of the learned label correlations help learn more discriminative label-level features. Experimental results demonstrate that our approach achieves exceptionally competitive performance compared to the state-of-the-art algorithms.
Collapse
Affiliation(s)
- Kai Zhang
- Computer Science and Engineering, Northeastern University, Shenyang, China
| | - Wei Liang
- Computer Science and Engineering, Northeastern University, Shenyang, China
| | - Peng Cao
- Computer Science and Engineering, Northeastern University, Shenyang, China; Key Laboratory of Intelligent Computing in Medical Image of Ministry of Education, Northeastern University, Shenyang, China; National Frontiers Science Center for Industrial Intelligence and Systems Optimization, Shenyang, China.
| | - Xiaoli Liu
- DAMO Academy, Alibaba Group, Hangzhou, China
| | - Jinzhu Yang
- Computer Science and Engineering, Northeastern University, Shenyang, China; Key Laboratory of Intelligent Computing in Medical Image of Ministry of Education, Northeastern University, Shenyang, China; National Frontiers Science Center for Industrial Intelligence and Systems Optimization, Shenyang, China
| | - Osmar Zaiane
- Alberta Machine Intelligence Institute, University of Alberta, Edmonton, Alberta, Canada
| |
Collapse
|
9
|
Rabie AH, Saleh AI. Diseases diagnosis based on artificial intelligence and ensemble classification. Artif Intell Med 2024; 148:102753. [PMID: 38325931 DOI: 10.1016/j.artmed.2023.102753] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2023] [Revised: 12/11/2023] [Accepted: 12/22/2023] [Indexed: 02/09/2024]
Abstract
BACKGROUND In recent years, Computer Aided Diagnosis (CAD) has become an important research area that attracted a lot of researchers. In medical diagnostic systems, several attempts have been made to build and enhance CAD applications to avoid errors that can cause dangerously misleading medical treatments. The most exciting opportunity for promoting the performance of CAD system can be accomplished by integrating Artificial Intelligence (AI) in medicine. This allows the effective automation of traditional manual workflow, which is slow, inaccurate and affected by human errors. AIMS This paper aims to provide a complete Computer Aided Disease Diagnosis (CAD2) strategy based on Machine Learning (ML) techniques that can help clinicians to make better medical decisions. METHODS The proposed CAD2 consists of three main sequential phases, namely; (i) Outlier Rejection Phase (ORP), (ii) Feature Selection Phase (FSP), and (iii) Classification Phase (CP). ORP is implemented to reject outliers using new Outlier Rejection Technique (ORT) that contains two sequential stages called Fast Outlier Rejection (FOR) and Accurate Outlier Rejection (AOR). The most informative features are selected through FSP using Hybrid Selection Technique (HST). HST includes two main stages called Quick Selection Stage (QS2) using fisher score as a filter method and Precise Selection Stage (PS2) using a Hybrid Bio-inspired Optimization (HBO) technique as a wrapper method. Finally, actual diagnose takes place through CP, which relies on Ensemble Classification Technique (ECT). RESULTS The proposed CAD2 has been tested experimentally against recent disease diagnostic strategies using two different datasets in which the first contains several diseases, while the second includes data for Covid-19 patients only. Experimental results have proven the high efficiency of the proposed CAD2 in terms of accuracy, error, precision, and recall compared with other competitors. Additionally, CAD2 strategy provides the best Wilcoxon signed rank test and Friedman test measurements against other strategies according to both datasets. CONCLUSION It is concluded that CAD2 strategy based on ORP, FSP, and CP gave an accurate diagnosis compared to other strategies because it gave the highest accuracy and the lowest error and implementation time.
Collapse
Affiliation(s)
- Asmaa H Rabie
- Computer Engineering and Systems Dept., Faculty of Engineering, Mansoura University, Mansoura, Egypt.
| | - Ahmed I Saleh
- Computer Engineering and Systems Dept., Faculty of Engineering, Mansoura University, Mansoura, Egypt
| |
Collapse
|
10
|
Priyadharshini M, Banu AF, Sharma B, Chowdhury S, Rabie K, Shongwe T. Hybrid Multi-Label Classification Model for Medical Applications Based on Adaptive Synthetic Data and Ensemble Learning. SENSORS (BASEL, SWITZERLAND) 2023; 23:6836. [PMID: 37571619 PMCID: PMC10422387 DOI: 10.3390/s23156836] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/08/2023] [Revised: 07/25/2023] [Accepted: 07/27/2023] [Indexed: 08/13/2023]
Abstract
In recent years, both machine learning and computer vision have seen growth in the use of multi-label categorization. SMOTE is now being utilized in existing research for data balance, and SMOTE does not consider that nearby examples may be from different classes when producing synthetic samples. As a result, there can be more class overlap and more noise. To avoid this problem, this work presented an innovative technique called Adaptive Synthetic Data-Based Multi-label Classification (ASDMLC). Adaptive Synthetic (ADASYN) sampling is a sampling strategy for learning from unbalanced data sets. ADASYN weights minority class instances by learning difficulty. For hard-to-learn minority class cases, synthetic data are created. Their numerical variables are normalized with the help of the Min-Max technique to standardize the magnitude of each variable's impact on the outcomes. The values of the attribute in this work are changed to a new range, from 0 to 1, using the normalization approach. To raise the accuracy of multi-label classification, Velocity-Equalized Particle Swarm Optimization (VPSO) is utilized for feature selection. In the proposed approach, to overcome the premature convergence problem, standard PSO has been improved by equalizing the velocity with each dimension of the problem. To expose the inherent label dependencies, the multi-label classification ensemble of Adaptive Neuro-Fuzzy Inference System (ANFIS), Probabilistic Neural Network (PNN), and Clustering-Based Decision tree methods will be processed based on an averaging method. The following criteria, including precision, recall, accuracy, and error rate, are used to assess performance. The suggested model's multi-label classification accuracy is 90.88%, better than previous techniques, which is PCT, HOMER, and ML-Forest is 65.57%, 70.66%, and 82.29%, respectively.
Collapse
Affiliation(s)
- M. Priyadharshini
- Department of Computer Science Engineering, Nalla Malla Reddy Engineering College, Hyderabad 500088, Telangana, India;
| | - A. Faritha Banu
- Department of Computer Science, Karpagam Academy of Higher Education, Coimbatore 631027, Tamil Nadu, India;
| | - Bhisham Sharma
- Chitkara University Institute of Engineering and Technology, Chitkara University, Rajpura 140401, Punjab, India
| | - Subrata Chowdhury
- Department of Computer Science and Engineering, Sreenivasa Institute of Technology and Management Studies, Chittoor 517127, Andra Pradesh, India;
| | - Khaled Rabie
- Department of Engineering, Manchester Metropolitan University, Manchester M15GD, UK
- Department of Electrical and Electronic Engineering Technology, University of Johannesburg, Auckland Park, Johannesburg 2006, South Africa;
| | - Thokozani Shongwe
- Department of Electrical and Electronic Engineering Technology, University of Johannesburg, Auckland Park, Johannesburg 2006, South Africa;
| |
Collapse
|
11
|
Assafo M, Städter JP, Meisel T, Langendörfer P. On the Stability and Homogeneous Ensemble of Feature Selection for Predictive Maintenance: A Classification Application for Tool Condition Monitoring in Milling. SENSORS (BASEL, SWITZERLAND) 2023; 23:s23094461. [PMID: 37177665 PMCID: PMC10181710 DOI: 10.3390/s23094461] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/24/2023] [Revised: 04/29/2023] [Accepted: 04/29/2023] [Indexed: 05/15/2023]
Abstract
Feature selection (FS) represents an essential step for many machine learning-based predictive maintenance (PdM) applications, including various industrial processes, components, and monitoring tasks. The selected features not only serve as inputs to the learning models but also can influence further decisions and analysis, e.g., sensor selection and understandability of the PdM system. Hence, before deploying the PdM system, it is crucial to examine the reproducibility and robustness of the selected features under variations in the input data. This is particularly critical for real-world datasets with a low sample-to-dimension ratio (SDR). However, to the best of our knowledge, stability of the FS methods under data variations has not been considered yet in the field of PdM. This paper addresses this issue with an application to tool condition monitoring in milling, where classifiers based on support vector machines and random forest were employed. We used a five-fold cross-validation to evaluate three popular filter-based FS methods, namely Fisher score, minimum redundancy maximum relevance (mRMR), and ReliefF, in terms of both stability and macro-F1. Further, for each method, we investigated the impact of the homogeneous FS ensemble on both performance indicators. To gain broad insights, we used four (2:2) milling datasets obtained from our experiments and NASA's repository, which differ in the operating conditions, sensors, SDR, number of classes, etc. For each dataset, the study was conducted for two individual sensors and their fusion. Among the conclusions: (1) Different FS methods can yield comparable macro-F1 yet considerably different FS stability values. (2) Fisher score (single and/or ensemble) is superior in most of the cases. (3) mRMR's stability is overall the lowest, the most variable over different settings (e.g., sensor(s), subset cardinality), and the one that benefits the most from the ensemble.
Collapse
Affiliation(s)
- Maryam Assafo
- Department of Wireless Systems, Brandenburg University of Technology Cottbus-Senftenberg, 03046 Cottbus, Germany
| | - Jost Philipp Städter
- Department of Automation Technology, Brandenburg University of Technology Cottbus-Senftenberg, 03046 Cottbus, Germany
| | | | - Peter Langendörfer
- Department of Wireless Systems, Brandenburg University of Technology Cottbus-Senftenberg, 03046 Cottbus, Germany
- IHP-Leibniz-Institut für innovative Mikroelektronik, 15236 Frankfurt, Germany
| |
Collapse
|
12
|
Unsupervised feature selection through combining graph learning and ℓ2,0-norm constraint. Inf Sci (N Y) 2023. [DOI: 10.1016/j.ins.2022.11.156] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/09/2022]
|
13
|
Sun L, Chen Y, Ding W, Xu J, Ma Y. AMFSA: Adaptive fuzzy neighborhood-based multilabel feature selection with ant colony optimization. Appl Soft Comput 2023. [DOI: 10.1016/j.asoc.2023.110211] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/17/2023]
|
14
|
Hu Y, Lu M, Li X, Cai B. Differential evolution based on network structure for feature selection. Inf Sci (N Y) 2023. [DOI: 10.1016/j.ins.2023.03.144] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/03/2023]
|
15
|
Pan X, Liu C, Feng T, Qi XS. A multi-objective based radiomics feature selection method for response prediction following radiotherapy. Phys Med Biol 2023; 68. [PMID: 36758241 DOI: 10.1088/1361-6560/acbadf] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2022] [Accepted: 02/09/2023] [Indexed: 02/11/2023]
Abstract
Objective.Radiomics contains a large amount of mineable information extracted from medical images, which has important significance in treatment response prediction for personalized treatment. Radiomics analyses generally involve high dimensions and redundant features, feature selection is essential for construction of prediction models.Approach.We proposed a novel multi-objective based radiomics feature selection method (MRMOPSO), where the number of features, sensitivity, and specificity are jointly considered as optimization objectives in feature selection. The MRMOPSO innovated in the following three aspects: (1) Fisher score to initialize the population to speed up the convergence; (2) Min-redundancy particle generation operations to reduce the redundancy between radiomics features, a truncation strategy was introduced to further reduce the number of features effectively; (3) Particle selection operations guided by elitism strategies to improve local search ability of the algorithm. We evaluated the effectiveness of the MRMOPSO by using a multi-institution oropharyngeal cancer dataset from The Cancer Imaging Archive. 357 patients were used for model training and cross validation, an additional 64 patients were used for evaluation.Main results.The area under the curve (AUC) of our method achieved AUCs of 0.82 and 0.84 for cross validation and independent dataset, respectively. Compared with classical feature selection methods, the AUC of MRMOPSO is significantly higher than the Lasso (AUC = 0.74,p-value = 0.02), minimal-redundancy-maximal-relevance criterion (mRMR) (AUC = 0.73,p-value = 0.05), F-score (AUC = 0.48,p-value < 0.01), and mutual information (AUC = 0.69,p-value < 0.01) methods. Compared to single-objective methods, the AUC of MRMOPSO is 12% higher than those of the genetic algorithm (GA) (AUC = 0.68,p-value = 0.02) and particle swarm optimization algorithm (AUC = 0.72,p-value = 0.05) methods. Compared to other multi-objective feature selection methods, the AUC of MRMOPSO is 14% higher than those of multiple objective particle swarm optimization (MOPSO) (AUC = 0.68,p-value = 0.02) and nondominated sorting genetic algorithm II (NSGA2) (AUC = 0.70,p-value = 0.03).Significance.We proposed a multi-objective based radiomics feature selection method. Compared to conventional feature reduction algorithms, the proposed algorithm effectively reduced feature dimension, and achieved superior performance, with improved sensitivity and specificity, for response prediction in radiotherapy.
Collapse
Affiliation(s)
- XiaoYing Pan
- School of Computer Science and Technology, Xi'an University of Posts and Telecommunications, Xi'an Shaanxi 710121, People's Republic of China.,Shaanxi Key Laboratory of Network Data Analysis and Intelligent Processing, Xi'an, Shaanxi 710121, People's Republic of China
| | - Chen Liu
- School of Computer Science and Technology, Xi'an University of Posts and Telecommunications, Xi'an Shaanxi 710121, People's Republic of China
| | - TianHao Feng
- School of Computer Science and Technology, Xi'an University of Posts and Telecommunications, Xi'an Shaanxi 710121, People's Republic of China
| | - X Sharon Qi
- Department of Radiation Oncology, University of California Los Angeles, Los Angeles, CA 90095, United States of America
| |
Collapse
|
16
|
Semisupervised Bacterial Heuristic Feature Selection Algorithm for High-Dimensional Classification with Missing Labels. INT J INTELL SYST 2023. [DOI: 10.1155/2023/4196920] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/24/2023]
Abstract
Feature selection is a crucial method for discovering relevant features in high-dimensional data. However, most studies primarily focus on completely labeled data, ignoring the frequent occurrence of missing labels in real-world problems. To address high-dimensional and label-missing problems in data classification simultaneously, we proposed a semisupervised bacterial heuristic feature selection algorithm. To track the label-missing problem, a k-nearest neighbor semisupervised learning strategy is designed to reconstruct missing labels. In addition, the bacterial heuristic algorithm is improved using hierarchical population initialization, dynamic learning, and elite population evolution strategies to enhance the search capacity for various feature combinations. To verify the effectiveness of the proposed algorithm, three groups of comparison experiments based on eight datasets are employed, including two traditional feature selection methods, four bacterial heuristic feature selection algorithms, and two swarm-based heuristic feature selection algorithms. Experimental results demonstrate that the proposed algorithm has obvious advantages in terms of classification accuracy and selected feature numbers.
Collapse
|
17
|
Sun L, Si S, Ding W, Xu J, Zhang Y. BSSFS: binary sparrow search algorithm for feature selection. INT J MACH LEARN CYB 2023. [DOI: 10.1007/s13042-023-01788-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]
|
18
|
Rough sets-based tri-trade for partially labeled data. APPL INTELL 2023. [DOI: 10.1007/s10489-022-04405-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]
|
19
|
Guo X, Tiwari P, Zou Q, Ding Y. Subspace projection-based weighted echo state networks for predicting therapeutic peptides. Knowl Based Syst 2023. [DOI: 10.1016/j.knosys.2023.110307] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023]
|
20
|
Kim YW, Lee S. Data Valuation Algorithm for Inertial Measurement Unit-Based Human Activity Recognition. SENSORS (BASEL, SWITZERLAND) 2022; 23:184. [PMID: 36616781 PMCID: PMC9823777 DOI: 10.3390/s23010184] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/15/2022] [Revised: 12/19/2022] [Accepted: 12/20/2022] [Indexed: 06/17/2023]
Abstract
This paper proposes a data valuation algorithm for inertial measurement unit-based human activity recognition (IMU-based HAR) data based on meta reinforcement learning. Unlike previous studies that received feature-level input, the algorithm in this study added a feature extraction structure to the data valuation algorithm, and it can receive raw-level inputs and achieve excellent performance. As IMU-based HAR data are multivariate time-series data, the proposed algorithm incorporates an architecture capable of extracting both local and global features by inserting a transformer encoder after the one-dimensional convolutional neural network (1D-CNN) backbone in the data value estimator. In addition, the 1D-CNN-based stacking ensemble structure, which exhibits excellent efficiency and performance on IMU-based HAR data, is used as a predictor to supervise model training. The Berg balance scale (BBS) IMU-based HAR dataset and the public datasets, UCI-HAR, WISDM, and PAMAP2, are used for performance evaluation in this study. The valuation performance of the proposed algorithm is observed to be excellent on IMU-based HAR data. The rate of discovering corrupted data is higher than 96% on all datasets. In addition, classification performance is confirmed to be improved by the suppression of discovery of low-value data.
Collapse
Affiliation(s)
- Yeon-Wook Kim
- Department of Electrical and Computer Engineering, Inha University, Incheon 22212, Republic of Korea
| | - Sangmin Lee
- Department of Electrical and Computer Engineering, Inha University, Incheon 22212, Republic of Korea
- Department of Smart Engineering Program in Biomedical Science & Engineering, Inha University, Incheon 22212, Republic of Korea
| |
Collapse
|
21
|
Differential diagnosis of thyroid nodule capsules using random forest guided selection of image features. Sci Rep 2022; 12:21636. [PMID: 36517531 PMCID: PMC9751070 DOI: 10.1038/s41598-022-25788-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2022] [Accepted: 12/05/2022] [Indexed: 12/15/2022] Open
Abstract
Microscopic evaluation of tissue sections stained with hematoxylin and eosin is the current gold standard for diagnosing thyroid pathology. Digital pathology is gaining momentum providing the pathologist with additional cues to traditional routes when placing a diagnosis, therefore it is extremely important to develop new image analysis methods that can extract image features with diagnostic potential. In this work, we use histogram and texture analysis to extract features from microscopic images acquired on thin thyroid nodule capsules sections and demonstrate how they enable the differential diagnosis of thyroid nodules. Targeted thyroid nodules are benign (i.e., follicular adenoma) and malignant (i.e., papillary thyroid carcinoma and its sub-type arising within a follicular adenoma). Our results show that the considered image features can enable the quantitative characterization of the collagen capsule surrounding thyroid nodules and provide an accurate classification of the latter's type using random forest.
Collapse
|
22
|
Li Y, Hu X, Pedrycz W, Yang F, Liu Z. Multivariable fuzzy rule-based models and their granular generalization: A visual interpretable framework. Appl Soft Comput 2022. [DOI: 10.1016/j.asoc.2022.109958] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
|
23
|
Multi-label feature selection based on label distribution and neighborhood rough set. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2022.11.096] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
|
24
|
Deng T, Huang Y, Yang G, Wang C. Pointwise mutual information sparsely embedded feature selection. Int J Approx Reason 2022. [DOI: 10.1016/j.ijar.2022.09.012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
|
25
|
Middha K, Mittal A. An effective feature selection method for type 2 diabetes mellitus detection using gene expression data. INTELLIGENT DECISION TECHNOLOGIES 2022. [DOI: 10.3233/idt-220077] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Type 2 diabetes mellitus (T2DM) detection is a chronic disease, which is caused due to the insulin disorder. Moreover, the decreased secretion of insulin increased the blood glucose level, thereby the human body cannot respond with the high glucose level. The T2DM sufferers do not produce enough insulin, or it resists insulin. The symptoms of T2DM disease are increased hunger, thirst, fatigue, frequent urination and blurred vision, and in some cases, there are no symptoms. The commonly utilized treatments of T2DM are exercise, diet, insulin therapy and medication. In this paper, the Competitive Multi-Verse Rider Optimizer (CMVRO)-based hybrid deep learning scheme is devised for T2DM detection. The hybrid deep learning involves two classifiers, such as Rider based Neural Network (RideNN) and Deep Residual Network (DRN). Moreover, the comparative analysis of T2DM detection is done by comparing various feature selection approaches, such as Tanimoto similarity, Chi square (Chi-2), Fisher Score (FS), Linear Discriminant Analysis (LDA), Random Forest (RF), and Support Vector Machine recursive feature elimination (SVM-RFE) for T2DM detection. Amongst these, the tanimoto similarity feature selection approach attained the better performance with respect to the testing accuracy, sensitivity and specificity of 0.932, 0.932 and 0.914, correspondingly.
Collapse
|
26
|
Noise-resistant multilabel fuzzy neighborhood rough sets for feature subset selection. Inf Sci (N Y) 2022. [DOI: 10.1016/j.ins.2022.11.060] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
27
|
Sun L, Wang X, Ding W, Xu J. TSFNFR: Two-stage fuzzy neighborhood-based feature reduction with binary whale optimization algorithm for imbalanced data classification. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2022.109849] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|
28
|
Sun L, Wang X, Ding W, Xu J, Meng H. TSFNFS: two-stage-fuzzy-neighborhood feature selection with binary whale optimization algorithm. INT J MACH LEARN CYB 2022. [DOI: 10.1007/s13042-022-01653-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/10/2022]
|
29
|
AFNFS: Adaptive fuzzy neighborhood-based feature selection with adaptive synthetic over-sampling for imbalanced data. Inf Sci (N Y) 2022. [DOI: 10.1016/j.ins.2022.08.118] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023]
|
30
|
Attribute Reduction Based on Lift and Random Sampling. Symmetry (Basel) 2022. [DOI: 10.3390/sym14091828] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
As one of the key topics in the development of neighborhood rough set, attribute reduction has attracted extensive attentions because of its practicability and interpretability for dimension reduction or feature selection. Although the random sampling strategy has been introduced in attribute reduction to avoid overfitting, uncontrollable sampling may still affect the efficiency of search reduct. By utilizing inherent characteristics of each label, Multi-label learning with Label specIfic FeaTures (Lift) algorithm can improve the performance of mathematical modeling. Therefore, here, it is attempted to use Lift algorithm to guide the sampling for reduce the uncontrollability of sampling. In this paper, an attribute reduction algorithm based on Lift and random sampling called ARLRS is proposed, which aims to improve the efficiency of searching reduct. Firstly, Lift algorithm is used to choose the samples from the dataset as the members of the first group, then the reduct of the first group is calculated. Secondly, random sampling strategy is used to divide the rest of samples into groups which have symmetry structure. Finally, the reducts are calculated group-by-group, which is guided by the maintenance of the reducts’ classification performance. Comparing with other 5 attribute reduction strategies based on rough set theory over 17 University of California Irvine (UCI) datasets, experimental results show that: (1) ARLRS algorithm can significantly reduce the time consumption of searching reduct; (2) the reduct derived from ARLRS algorithm can provide satisfying performance in classification tasks.
Collapse
|
31
|
Peng X, Wang P, Xia S, Wang C, Chen W. VPGB: A granular-ball based model for attribute reduction and classification with label noise. Inf Sci (N Y) 2022. [DOI: 10.1016/j.ins.2022.08.066] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
|
32
|
Zhang X, Jiang Z, Xu W. Feature selection using a weighted method in interval-valued decision information systems. APPL INTELL 2022. [DOI: 10.1007/s10489-022-03987-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
33
|
Multi-Target Rough Sets and Their Approximation Computation with Dynamic Target Sets. INFORMATION 2022. [DOI: 10.3390/info13080385] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
Multi-label learning has become a hot topic in recent years, attracting scholars’ attention, including applying the rough set model in multi-label learning. Exciting works that apply the rough set model into multi-label learning usually adapt the rough sets model’s purpose for a single decision table to a multi-decision table with a conservative strategy. However, multi-label learning enforces the rough set model which wants to be applied considering multiple target concepts, and there is label correlation among labels naturally. For that proposal, this paper proposes a rough set model that has multiple target concepts and considers the similarity relationships among target concepts to capture label correlation among labels. The properties of the proposed model are also investigated. The rough set model that has multiple target concepts can handle the data set that has multiple decisions, and it has inherent advantages when applied to multi-label learning. Moreover, we consider how to compute the approximations of GMTRSs under a static and dynamic situation when a target concept is added or removed and derive the corresponding algorithms, respectively. The efficiency and validity of the designed algorithms are verified by experiments.
Collapse
|
34
|
Yang X, Chen H, Li T, Luo C. A noise-aware fuzzy rough set approach for feature selection. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2022.109092] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
35
|
Azam B, Mandal R, Verma B. Relationship aware context adaptive deep learning for image parsing. Inf Sci (N Y) 2022. [DOI: 10.1016/j.ins.2022.05.125] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
|
36
|
A Random Approximate Reduct-Based Ensemble Learning Approach and Its Application in Software Defect Prediction. Inf Sci (N Y) 2022. [DOI: 10.1016/j.ins.2022.07.130] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
|
37
|
|
38
|
Gunduz H. Malware detection framework based on graph variational autoencoder extracted embeddings from API-call graphs. PeerJ Comput Sci 2022; 8:e988. [PMID: 35634097 PMCID: PMC9137949 DOI: 10.7717/peerj-cs.988] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2022] [Accepted: 04/29/2022] [Indexed: 06/15/2023]
Abstract
Malware harms the confidentiality and integrity of the information that causes material and moral damages to institutions or individuals. This study proposed a malware detection model based on API-call graphs and used Graph Variational Autoencoder (GVAE) to reduce the size of graph node features extracted from Android apk files. GVAE-reduced embeddings were fed to linear-based (SVM) and ensemble-based (LightGBM) models to finalize the malware detection process. To validate the effectiveness of the GVAE-reduced features, recursive feature elimination (RFE) and Fisher score (FS) were applied to select informative feature sets with the same sizes as GVAE-reduced embeddings. The results with RFE and FS selections revealed that LightGBM and RFE-selected 50 features achieved the highest accuracy (0.907) and F-measure (0.852) rates. When we used GVAE-reduced embeddings in the classification, there was an approximate increase of %4 in both models' accuracy rates. The same performance increase occurred in F-measure rates which directly indicated the improvement in the discrimination powers of the models. The last conducted experiment that combined the strengths of RFE selection and GVAE led to a performance increase compared to only GVAE-reduced embeddings. RFE selection achieved an accuracy rate of 0.967 in LightGBM with the help of selected 30 relevant features from the combination of all GVAE-embeddings.
Collapse
Affiliation(s)
- Hakan Gunduz
- Software Engineering Department, Kocaeli University, Kocaeli, Marmara, Turkey
| |
Collapse
|
39
|
Online group streaming feature selection using entropy-based uncertainty measures for fuzzy neighborhood rough sets. COMPLEX INTELL SYST 2022. [DOI: 10.1007/s40747-022-00763-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
AbstractOnline group streaming feature selection, as an essential online processing method, can deal with dynamic feature selection tasks by considering the original group structure information of the features. Due to the fuzziness and uncertainty of the feature stream, some existing methods are unstable and yield low predictive accuracy. To address these issues, this paper presents a novel online group streaming feature selection method (FNE-OGSFS) using fuzzy neighborhood entropy-based uncertainty measures. First, a separability measure integrating the dependency degree with the coincidence degree is proposed and introduced into the fuzzy neighborhood rough sets model to define a new fuzzy neighborhood entropy. Second, inspired by both algebra and information views, some fuzzy neighborhood entropy-based uncertainty measures are investigated and some properties are derived. Furthermore, the optimal features in the group are selected to flow into the feature space according to the significance of features, and the features with interactions are left. Then, all selected features are re-evaluated by the Lasso model to discard the redundant features. Finally, an online group streaming feature selection algorithm is designed. Experimental results compared with eight representative methods on thirteen datasets show that FNE-OGSFS can achieve better comprehensive performance.
Collapse
|
40
|
ASFS: A novel streaming feature selection for multi-label data based on neighborhood rough set. APPL INTELL 2022. [DOI: 10.1007/s10489-022-03366-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023]
|
41
|
|
42
|
|
43
|
|
44
|
Sun L, Zhang J, Ding W, Xu J. Feature reduction for imbalanced data classification using similarity-based feature clustering with adaptive weighted K-nearest neighbors. Inf Sci (N Y) 2022. [DOI: 10.1016/j.ins.2022.02.004] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023]
|
45
|
Sun L, Si S, Zhao J, Xu J, Lin Y, Lv Z. Feature selection using binary monarch butterfly optimization. APPL INTELL 2022. [DOI: 10.1007/s10489-022-03554-9] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
|
46
|
Incremental feature selection by sample selection and feature-based accelerator. Appl Soft Comput 2022. [DOI: 10.1016/j.asoc.2022.108800] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
47
|
|
48
|
Chen W, Zhang Q, Dai Y. Sequential multi-class three-way decisions based on cost-sensitive learning. Int J Approx Reason 2022. [DOI: 10.1016/j.ijar.2022.03.009] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
49
|
Sun L, Wang T, Ding W, Xu J, Tan A. Two‐stage‐neighborhood‐based multilabel classification for incomplete data with missing labels. INT J INTELL SYST 2022. [DOI: 10.1002/int.22861] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023]
Affiliation(s)
- Lin Sun
- College of Computer and Information Engineering Henan Normal University Xinxiang China
- Engineering Laboratory of Intelligence Business and Internet of Things Technology Henan Normal University Xinxiang China
| | - Tianxiang Wang
- College of Computer and Information Engineering Henan Normal University Xinxiang China
| | - Weiping Ding
- School of Information Science and Technology Nantong University Nantong China
| | - Jiucheng Xu
- College of Computer and Information Engineering Henan Normal University Xinxiang China
| | - Anhui Tan
- School of Mathematics, Physics, and Information Science Zhejiang Ocean University Zhoushan China
| |
Collapse
|
50
|
Feature engineering solution with structured query language analytic functions in detecting electricity frauds using machine learning. Sci Rep 2022; 12:3257. [PMID: 35228648 PMCID: PMC8885834 DOI: 10.1038/s41598-022-07337-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2021] [Accepted: 02/16/2022] [Indexed: 11/26/2022] Open
Abstract
Detecting fraud related to electricity consumption is usually a difficult challenge as the input datasets are sometimes unreliable due to missing and inconsistent records, faults, misinterpretation of meter reading remarks, status, etc. In this paper, we obtain meaningful insights from fraud detection using real datasets of Tunisian electricity consumption metered by conventional meters. We propose an extensive feature engineering approach using the structured query language (SQL) analytic functions. Furthermore, double merging of datasets reveals more dimensions of the data allowing better detection of irregularities in consumption. We analyze the results of several machine learning (ML) algorithms that manage cases of weakly correlated features and highly unbalanced datasets. The skewness of the target is approached as a regular characteristic of the input data because most of consumers are fair and only a small portion attempt to mislead the utility companies by tampering with metering devices. Our fraud detection solutions consist of combining classifiers with an anomaly detection feature obtained with an unsupervised ML algorithm—Isolation Forest, and extensive feature engineering using SQL analytic functions on large datasets. Several techniques for feature processing enhanced the Area Under the Curve score for Decision Tree algorithm from 0.68 to 0.99.
Collapse
|