1
|
Damtew YG, Chen H. SMMO-CoFS: Synthetic Multi-minority Oversampling with Collaborative Feature Selection for Network Intrusion Detection System. INT J COMPUT INT SYS 2023. [DOI: 10.1007/s44196-022-00171-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/13/2023] Open
Abstract
AbstractResearchers publish various studies to improve the performance of network intrusion detection systems. However, there is still a high false alarm rate and missing intrusions due to class imbalance in the multi-class dataset. This imbalanced distribution of classes results in low detection accuracy for the minority classes. This paper proposes a Synthetic Multi-minority Oversampling (SMMO) framework by integrating with a collaborative feature selection (CoFS) approach in network intrusion detection systems. Our framework aims to increase the detection accuracy of the extreme minority classes (i.e., user-to-root and remote-to-local attacks) by improving the dataset’s class distribution and selecting relevant features. In our framework, SMMO generates synthetic data and iteratively over-samples multi-minority classes. And the collaboration of correlation-based feature selection with an evolutionary algorithm selects essential features. We evaluate our framework with a random forest, J48, BayesNet, and AdaBoostM1. In a multi-class NSL-KDD dataset, the experimental results show that the proposed framework significantly improves the detection accuracy of the extreme minority classes compared with other approaches.
Collapse
|
2
|
Distance-based arranging oversampling technique for imbalanced data. Neural Comput Appl 2022. [DOI: 10.1007/s00521-022-07828-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/14/2022]
|
3
|
Gowri R, Rathipriya R. Non-swarm-based computational approach for mining cancer drug target modules in protein interaction network. Med Biol Eng Comput 2022; 60:1947-1976. [DOI: 10.1007/s11517-022-02574-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2021] [Accepted: 04/11/2022] [Indexed: 11/28/2022]
|
4
|
Automated method for real-time AMD screening of fundus images dedicated for mobile devices. Med Biol Eng Comput 2022; 60:1449-1479. [DOI: 10.1007/s11517-022-02546-8] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2020] [Accepted: 03/06/2022] [Indexed: 01/01/2023]
|
5
|
RDPVR: Random Data Partitioning with Voting Rule for Machine Learning from Class-Imbalanced Datasets. ELECTRONICS 2022. [DOI: 10.3390/electronics11020228] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Abstract
Since most classifiers are biased toward the dominant class, class imbalance is a challenging problem in machine learning. The most popular approaches to solving this problem include oversampling minority examples and undersampling majority examples. Oversampling may increase the probability of overfitting, whereas undersampling eliminates examples that may be crucial to the learning process. We present a linear time resampling method based on random data partitioning and a majority voting rule to address both concerns, where an imbalanced dataset is partitioned into a number of small subdatasets, each of which must be class balanced. After that, a specific classifier is trained for each subdataset, and the final classification result is established by applying the majority voting rule to the results of all of the trained models. We compared the performance of the proposed method to some of the most well-known oversampling and undersampling methods, employing a range of classifiers, on 33 benchmark machine learning class-imbalanced datasets. The classification results produced by the classifiers employed on the generated data by the proposed method were comparable to most of the resampling methods tested, with the exception of SMOTEFUNA, which is an oversampling method that increases the probability of overfitting. The proposed method produced results that were comparable to the Easy Ensemble (EE) undersampling method. As a result, for solving the challenge of machine learning from class-imbalanced datasets, we advocate using either EE or our method.
Collapse
|