1
|
Musthafa MB, Huda S, Kodera Y, Ali MA, Araki S, Mwaura J, Nogami Y. Optimizing IoT Intrusion Detection Using Balanced Class Distribution, Feature Selection, and Ensemble Machine Learning Techniques. SENSORS (BASEL, SWITZERLAND) 2024; 24:4293. [PMID: 39001072 PMCID: PMC11244377 DOI: 10.3390/s24134293] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/29/2024] [Revised: 06/26/2024] [Accepted: 06/27/2024] [Indexed: 07/16/2024]
Abstract
Internet of Things (IoT) devices are leading to advancements in innovation, efficiency, and sustainability across various industries. However, as the number of connected IoT devices increases, the risk of intrusion becomes a major concern in IoT security. To prevent intrusions, it is crucial to implement intrusion detection systems (IDSs) that can detect and prevent such attacks. IDSs are a critical component of cybersecurity infrastructure. They are designed to detect and respond to malicious activities within a network or system. Traditional IDS methods rely on predefined signatures or rules to identify known threats, but these techniques may struggle to detect novel or sophisticated attacks. The implementation of IDSs with machine learning (ML) and deep learning (DL) techniques has been proposed to improve IDSs' ability to detect attacks. This will enhance overall cybersecurity posture and resilience. However, ML and DL techniques face several issues that may impact the models' performance and effectiveness, such as overfitting and the effects of unimportant features on finding meaningful patterns. To ensure better performance and reliability of machine learning models in IDSs when dealing with new and unseen threats, the models need to be optimized. This can be done by addressing overfitting and implementing feature selection. In this paper, we propose a scheme to optimize IoT intrusion detection by using class balancing and feature selection for preprocessing. We evaluated the experiment on the UNSW-NB15 dataset and the NSL-KD dataset by implementing two different ensemble models: one using a support vector machine (SVM) with bagging and another using long short-term memory (LSTM) with stacking. The results of the performance and the confusion matrix show that the LSTM stacking with analysis of variance (ANOVA) feature selection model is a superior model for classifying network attacks. It has remarkable accuracies of 96.92% and 99.77% and overfitting values of 0.33% and 0.04% on the two datasets, respectively. The model's ROC is also shaped with a sharp bend, with AUC values of 0.9665 and 0.9971 for the UNSW-NB15 dataset and the NSL-KD dataset, respectively.
Collapse
Affiliation(s)
- Muhammad Bisri Musthafa
- Graduate School of Environmental, Life, Natural Science and Technology, Okayama University, Okayama 700-8530, Japan
| | - Samsul Huda
- Green Innovation Center, Okayama University, Okayama 700-8530, Japan
| | - Yuta Kodera
- Graduate School of Environmental, Life, Natural Science and Technology, Okayama University, Okayama 700-8530, Japan
| | - Md Arshad Ali
- Faculty of CSE, Hajee Mohammad Danesh Science and Technology University, Dinajpur 5200, Bangladesh
| | - Shunsuke Araki
- Graduate School of Computer Science and Systems Engineering, Kyushu Institute of Technology, Fukuoka 804-8550, Japan
| | - Jedidah Mwaura
- Graduate School of Computer Science and Systems Engineering, Kyushu Institute of Technology, Fukuoka 804-8550, Japan
| | - Yasuyuki Nogami
- Graduate School of Environmental, Life, Natural Science and Technology, Okayama University, Okayama 700-8530, Japan
| |
Collapse
|
2
|
Towards an Effective Intrusion Detection Model Using Focal Loss Variational Autoencoder for Internet of Things (IoT). SENSORS 2022; 22:s22155822. [PMID: 35957379 PMCID: PMC9371235 DOI: 10.3390/s22155822] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/20/2022] [Revised: 06/21/2022] [Accepted: 06/24/2022] [Indexed: 12/04/2022]
Abstract
As the range of security attacks increases across diverse network applications, intrusion detection systems are of central interest. Such detection systems are more crucial for the Internet of Things (IoT) due to the voluminous and sensitive data it produces. However, the real-world network produces imbalanced traffic including different and unknown attack types. Due to this imbalanced nature of network traffic, the traditional learning-based detection techniques suffer from lower overall detection performance, higher false-positive rate, and lower minority-class attack detection rates. To address the issue, we propose a novel deep generative-based model called Class-wise Focal Loss Variational AutoEncoder (CFLVAE) which overcomes the data imbalance problem by generating new samples for minority attack classes. Furthermore, we design an effective and cost-sensitive objective function called Class-wise Focal Loss (CFL) to train the traditional Variational AutoEncoder (VAE). The CFL objective function focuses on different minority class samples and scrutinizes high-level feature representation of observed data. This leads the VAE to generate more realistic, diverse, and quality intrusion data to create a well-balanced intrusion dataset. The balanced dataset results in improving the intrusion detection accuracy of learning-based classifiers. Therefore, a Deep Neural Network (DNN) classifier with a unique architecture is then trained using the balanced intrusion dataset to enhance the detection performance. Moreover, we utilize a challenging and highly imbalanced intrusion dataset called NSL-KDD to conduct an extensive experiment with the proposed model. The results demonstrate that the proposed CFLVAE with DNN (CFLVAE-DNN) model obtains promising performance in generating realistic new intrusion data samples and achieves superior intrusion detection performance. Additionally, the proposed CFLVAE-DNN model outperforms several state-of-the-art data generation and traditional intrusion detection methods. Specifically, the CFLVAE-DNN achieves 88.08% overall intrusion detection accuracy and 3.77% false positive rate. More significantly, it obtains the highest low-frequency attack detection rates for U2R (79.25%) and R2L (67.5%) against all the state-of-the-art algorithms.
Collapse
|
3
|
XGBoost for Imbalanced Multiclass Classification-Based Industrial Internet of Things Intrusion Detection Systems. SUSTAINABILITY 2022. [DOI: 10.3390/su14148707] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
Abstract
The Industrial Internet of Things (IIoT) has advanced digital technology and the fastest interconnection, which creates opportunities to substantially grow industrial businesses today. Although IIoT provides promising opportunities for growth, the massive sensor IoT data collected are easily attacked by cyber criminals. Hence, IIoT requires different high security levels to protect the network. An Intrusion Detection System (IDS) is one of the crucial security solutions, which aims to detect the network’s abnormal behavior and monitor safe network traffic to avoid attacks. In particular, the effectiveness of the Machine Learning (ML)-based IDS approach to building a secure IDS application is attracting the security research community in both the general cyber network and the specific IIoT network. However, most available IIoT datasets contain multiclass output data with imbalanced distributions. This is the main reason for the reduction in the detection accuracy of attacks of the ML-based IDS model. This research proposes an IDS for IIoT imbalanced datasets by applying the eXtremely Gradient Boosting (XGBoost) model to overcome this issue. Two modern IIoT imbalanced datasets were used to assess our proposed method’s effectiveness and robustness, X-IIoTDS and TON_IoT. The XGBoost model achieved excellent attack detection with F1 scores of 99.9% and 99.87% on the two datasets. This result demonstrated that the proposed approach improved the detection attack performance in imbalanced multiclass IIoT datasets and was superior to existing IDS frameworks.
Collapse
|
4
|
Evolving Anomaly Detection for Network Streaming Data. Inf Sci (N Y) 2022. [DOI: 10.1016/j.ins.2022.06.064] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
|
5
|
Gunduz H. Malware detection framework based on graph variational autoencoder extracted embeddings from API-call graphs. PeerJ Comput Sci 2022; 8:e988. [PMID: 35634097 PMCID: PMC9137949 DOI: 10.7717/peerj-cs.988] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2022] [Accepted: 04/29/2022] [Indexed: 06/15/2023]
Abstract
Malware harms the confidentiality and integrity of the information that causes material and moral damages to institutions or individuals. This study proposed a malware detection model based on API-call graphs and used Graph Variational Autoencoder (GVAE) to reduce the size of graph node features extracted from Android apk files. GVAE-reduced embeddings were fed to linear-based (SVM) and ensemble-based (LightGBM) models to finalize the malware detection process. To validate the effectiveness of the GVAE-reduced features, recursive feature elimination (RFE) and Fisher score (FS) were applied to select informative feature sets with the same sizes as GVAE-reduced embeddings. The results with RFE and FS selections revealed that LightGBM and RFE-selected 50 features achieved the highest accuracy (0.907) and F-measure (0.852) rates. When we used GVAE-reduced embeddings in the classification, there was an approximate increase of %4 in both models' accuracy rates. The same performance increase occurred in F-measure rates which directly indicated the improvement in the discrimination powers of the models. The last conducted experiment that combined the strengths of RFE selection and GVAE led to a performance increase compared to only GVAE-reduced embeddings. RFE selection achieved an accuracy rate of 0.967 in LightGBM with the help of selected 30 relevant features from the combination of all GVAE-embeddings.
Collapse
Affiliation(s)
- Hakan Gunduz
- Software Engineering Department, Kocaeli University, Kocaeli, Marmara, Turkey
| |
Collapse
|
6
|
Binbusayyis A, Alaskar H, Vaiyapuri T, Dinesh M. An investigation and comparison of machine learning approaches for intrusion detection in IoMT network. THE JOURNAL OF SUPERCOMPUTING 2022; 78:17403-17422. [PMID: 35601090 PMCID: PMC9114823 DOI: 10.1007/s11227-022-04568-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Accepted: 04/27/2022] [Indexed: 06/15/2023]
Abstract
Internet of Medical Things (IoMT) is network of interconnected medical devices (smart watches, pace makers, prosthetics, glucometer, etc.), software applications, and health systems and services. IoMT has successfully addressed many old healthcare problems. But it comes with its drawbacks essentially with patient's information privacy and security related issues that comes from IoMT architecture. Using obsolete systems can bring security vulnerabilities and draw attacker's attention emphasizing the need for effective solution to secure and protect the data traffic in IoMT network. Recently, intrusion detection system (IDS) is regarded as an essential security solution for protecting IoMT network. In the past decades, machines learning (ML) algorithms have demonstrated breakthrough results in the field of intrusion detection. Notwithstanding, to our knowledge, there is no work that investigates the power of machines learning algorithms for intrusion detection in IoMT network. This paper aims to fill this gap of knowledge investigating the application of different ML algorithms for intrusion detection in IoMT network. The investigation analysis includes ML algorithms such as K-nearest neighbor, Naïve Bayes, support vector machine, artificial neural network and decision tree. The benchmark dataset, Bot-IoT which is publicly available with comprehensive set of attacks was used to train and test the effectiveness of all ML models considered for investigation. Also, we used comprehensive set of evaluation metrics to compare the power of ML algorithms with regard to their detection accuracy for intrusion in IoMT networks. The outcome of the analysis provides a promising path to identify the best the machine learning approach can be used for building effective IDS that can safeguard IoMT network against malicious activities.
Collapse
Affiliation(s)
- Adel Binbusayyis
- College of Computer Engineering and Sciences, Prince Sattam Bin Abdulaziz University, Al Kharj, Saudi Arabia
| | - Haya Alaskar
- College of Computer Engineering and Sciences, Prince Sattam Bin Abdulaziz University, Al Kharj, Saudi Arabia
| | - Thavavel Vaiyapuri
- College of Computer Engineering and Sciences, Prince Sattam Bin Abdulaziz University, Al Kharj, Saudi Arabia
| | - M. Dinesh
- College of Computing and Informatics, Saudi Electronic University, Riyadh, Saudi Arabia
| |
Collapse
|
7
|
Jaw E, Wang X. A novel hybrid-based approach of snort automatic rule generator and security event correlation (SARG-SEC). PeerJ Comput Sci 2022; 8:e900. [PMID: 35494802 PMCID: PMC9044335 DOI: 10.7717/peerj-cs.900] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2021] [Accepted: 02/01/2022] [Indexed: 06/14/2023]
Abstract
The rapid advanced technological development alongside the Internet with its cutting-edge applications has positively impacted human society in many aspects. Nevertheless, it equally comes with the escalating privacy and critical cybersecurity concerns that can lead to catastrophic consequences, such as overwhelming the current network security frameworks. Consequently, both the industry and academia have been tirelessly harnessing various approaches to design, implement and deploy intrusion detection systems (IDSs) with event correlation frameworks to help mitigate some of these contemporary challenges. There are two common types of IDS: signature and anomaly-based IDS. Signature-based IDS, specifically, Snort works on the concepts of rules. However, the conventional way of creating Snort rules can be very costly and error-prone. Also, the massively generated alerts from heterogeneous anomaly-based IDSs is a significant research challenge yet to be addressed. Therefore, this paper proposed a novel Snort Automatic Rule Generator (SARG) that exploits the network packet contents to automatically generate efficient and reliable Snort rules with less human intervention. Furthermore, we evaluated the effectiveness and reliability of the generated Snort rules, which produced promising results. In addition, this paper proposed a novel Security Event Correlator (SEC) that effectively accepts raw events (alerts) without prior knowledge and produces a much more manageable set of alerts for easy analysis and interpretation. As a result, alleviating the massive false alarm rate (FAR) challenges of existing IDSs. Lastly, we have performed a series of experiments to test the proposed systems. It is evident from the experimental results that SARG-SEC has demonstrated impressive performance and could significantly mitigate the existing challenges of dealing with the vast generated alerts and the labor-intensive creation of Snort rules.
Collapse
Affiliation(s)
- Ebrima Jaw
- College of Computer Science and Technology, Guizhou University, Guiyang, Guizhou, China
- School of Information Technology and Communication, University of The Gambia (UTG), Banjul, Peace Building, Kanifing, The Gambia
| | - Xueming Wang
- College of Computer Science and Technology, Guizhou University, Guiyang, Guizhou, China
| |
Collapse
|
8
|
Feature Selection and Ensemble-Based Intrusion Detection System: An Efficient and Comprehensive Approach. Symmetry (Basel) 2021. [DOI: 10.3390/sym13101764] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
The emergence of ground-breaking technologies such as artificial intelligence, cloud computing, big data powered by the Internet, and its highly valued real-world applications consisting of symmetric and asymmetric data distributions, has significantly changed our lives in many positive aspects. However, it equally comes with the current catastrophic daily escalating cyberattacks. Thus, raising the need for researchers to harness the innovative strengths of machine learning to design and implement intrusion detection systems (IDSs) to help mitigate these unfortunate cyber threats. Nevertheless, trustworthy and effective IDSs is a challenge due to low accuracy engendered by vast, irrelevant, and redundant features; inept detection of all types of novel attacks by individual machine learning classifiers; costly and faulty use of labeled training datasets cum significant false alarm rates (FAR) and the excessive model building and testing time. Therefore, this paper proposed a promising hybrid feature selection (HFS) with an ensemble classifier, which efficiently selects relevant features and provides consistent attack classification. Initially, we harness the various strengths of CfsSubsetEval, genetic search, and a rule-based engine to effectively select subsets of features with high correlation, which considerably reduced the model complexity and enhanced the generalization of learning algorithms, both of which are symmetry learning attributes. Moreover, using a voting method and average of probabilities, we present an ensemble classifier that used K-means, One-Class SVM, DBSCAN, and Expectation-Maximization, abbreviated (KODE) as an enhanced classifier that consistently classifies the asymmetric probability distributions between malicious and normal instances. HFS-KODE achieves remarkable results using 10-fold cross-validation, CIC-IDS2017, NSL-KDD, and UNSW-NB15 datasets and various metrics. For example, it outclassed all the selected individual classification methods, cutting-edge feature selection, and some current IDSs techniques with an excellent performance accuracy of 99.99%, 99.73%, and 99.997%, and a detection rate of 99.75%, 96.64%, and 99.93% for CIC-IDS2017, NSL-KDD, and UNSW-NB15, respectively based on only 11, 8, 13 selected relevant features from the above datasets. Finally, considering the drastically reduced FAR and time, coupled with no need for labeled datasets, it is self-evident that HFS-KODE proves to have a remarkable performance compared to many current approaches.
Collapse
|
9
|
Seraphim BI, Poovammal E, Ramana K, Kryvinska N, Penchalaiah N. A hybrid network intrusion detection using darwinian particle swarm optimization and stacked autoencoder hoeffding tree. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2021; 18:8024-8044. [PMID: 34814287 DOI: 10.3934/mbe.2021398] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Cybersecurity experts estimate that cyber-attack damage cost will rise tremendously. The massive utilization of the web raises stress over how to pass on electronic information safely. Usually, intruders try different attacks for getting sensitive information. An Intrusion Detection System (IDS) plays a crucial role in identifying the data and user deviations in an organization. In this paper, stream data mining is incorporated with an IDS to do a specific task. The task is to distinguish the important, covered up information successfully in less amount of time. The experiment focuses on improving the effectiveness of an IDS using the proposed Stacked Autoencoder Hoeffding Tree approach (SAE-HT) using Darwinian Particle Swarm Optimization (DPSO) for feature selection. The experiment is performed in NSL_KDD dataset the important features are obtained using DPSO and the classification is performed using proposed SAE-HT technique. The proposed technique achieves a higher accuracy of 97.7% when compared with all the other state-of-art techniques. It is observed that the proposed technique increases the accuracy and detection rate thus reducing the false alarm rate.
Collapse
Affiliation(s)
- B Ida Seraphim
- Department of Computer Science and Engineering, SRM Institute of Science and Technology, Chennai, India
| | - E Poovammal
- Department of Computer Science and Engineering, SRM Institute of Science and Technology, Chennai, India
| | - Kadiyala Ramana
- Department of Artificial Intelligence & Data Science, Annamacharya Institute of Technology and Sciences, Rajampet, India
| | - Natalia Kryvinska
- Head of Information Systems Department, Faculty of Management Comenius University in Bratislava, Odbojárov 10, 82005 Bratislava 25, Slovakia
| | - N Penchalaiah
- Department of CSE, Annamacharya Institute of Technology and Sciences, Rajampet, India
| |
Collapse
|