1
|
Ahmed A, Asim M, Ullah I, Zainulabidin, Ateya AA. An optimized ensemble model with advanced feature selection for network intrusion detection. PeerJ Comput Sci 2024; 10:e2472. [PMID: 39650446 PMCID: PMC11623070 DOI: 10.7717/peerj-cs.2472] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2024] [Accepted: 10/11/2024] [Indexed: 12/11/2024]
Abstract
In today's digital era, advancements in technology have led to unparalleled levels of connectivity, but have also brought forth a new wave of cyber threats. Network Intrusion Detection Systems (NIDS) are crucial for ensuring the security and integrity of networked systems by identifying and mitigating unauthorized access and malicious activities. Traditional machine learning techniques have been extensively employed for this purpose due to their high accuracy and low false alarm rates. However, these methods often fall short in detecting sophisticated and evolving threats, particularly those involving subtle variations or mutations of known attack patterns. To address this challenge, our study presents the "Optimized Random Forest (Opt-Forest)," an innovative ensemble model that combines decision forest approaches with genetic algorithms (GAs) for enhanced intrusion detection. The genetic algorithms based decision forest construction offers notable benefits by traversing a wider exploration space and mitigating the risk of becoming stuck in local optima, resulting in the discovery of more accurate and compact decision trees. Leveraging advanced feature selection techniques, including Best-First Search, Particle Swarm Optimization (PSO), Evolutionary Search, and Genetic Search (GS), along with contemporary dataset, this research aims to enhance the adaptability and resilience of NIDS against modern cyber threats. We conducted a comprehensive evaluation of the proposed approach against several well-known machine learning models, including AdaBoostM1 (AbM1), K-nearest neighbor (KNN), J48-Decision Tree (J48), multilayer perceptron (MLP), stochastic gradient descent (SGD), naïve Bayes (NB), and logistic model tree (LMT). The comparative analysis demonstrates the effectiveness and superiority of our method across various performance metrics, highlighting its potential to significantly enhance the capabilities of network intrusion detection systems.
Collapse
Affiliation(s)
- Afaq Ahmed
- School of Computer Science and Engineering, Central South University, Changsha, Hunan, China
| | - Muhammad Asim
- EIAS Data Science Lab, College of Computer and Information Sciences, Prince Sultan University, Riyadh, Saudi Arabia
| | - Irshad Ullah
- School of Computer Science and Engineering, Central South University, Changsha, Hunan, China
| | - Zainulabidin
- Institute of Business and Management Sciences (IBMS), The University of Agriculture, Peshawar, Khyber Pakhtunkhwa, Pakistan
| | - Abdelhamied A. Ateya
- EIAS Data Science Lab, College of Computer and Information Sciences, Prince Sultan University, Riyadh, Saudi Arabia
- Department of Electronics and Communications Engineering, Zagazig University, Zagazig, Egypt
| |
Collapse
|
2
|
Ryoo H, Cho S, Oh T, Kim Y, Suh SH. Identification of doping suspicions through artificial intelligence-powered analysis on athlete's performance passport in female weightlifting. Front Physiol 2024; 15:1344340. [PMID: 38938745 PMCID: PMC11208455 DOI: 10.3389/fphys.2024.1344340] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2023] [Accepted: 05/13/2024] [Indexed: 06/29/2024] Open
Abstract
Introduction Doping remains a persistent concern in sports, compromising fair competition. The Athlete Biological Passport (ABP) has been a standard anti-doping measure, but confounding factors challenge its effectiveness. Our study introduces an artificial intelligence-driven approach for identifying potential doping suspicious, utilizing the Athlete's Performance Passport (APP), which integrates both demographic profiles and performance data, among elite female weightlifters. Methods Analyzing publicly available performance data in female weightlifting from 1998 to 2020, along with demographic information, encompassing 17,058 entities, we categorized weightlifters by age, body weight (BW) class, and performance levels. Documented anti-doping rule violations (ADRVs) cases were also retained. We employed AI-powered algorithms, including XGBoost, Multilayer Perceptron (MLP), and an Ensemble model, which integrates XGBoost and MLP, to identify doping suspicions based on the dataset we obtained. Results Our findings suggest a potential doping inclination in female weightlifters in their mid-twenties, and the sanctioned prevalence was the highest in the top 1% performance level and then decreased thereafter. Performance profiles and sanction trends across age groups and BW classes reveal consistently superior performances in sanctioned cases. The Ensemble model showcased impressive predictive performance, achieving a 53.8% prediction rate among the weightlifters sanctioned in the 2008, 2012, and 2016 Olympics. This demonstrated the practical application of the Athlete's Performance Passport (APP) in identifying potential doping suspicions. Discussion Our study pioneers an AI-driven APP approach in anti-doping, offering a proactive and efficient methodology. The APP, coupled with advanced AI algorithms, holds promise in revolutionizing the efficiency and objectivity of doping tests, providing a novel avenue for enhancing anti-doping measures in elite female weightlifting and potentially extending to diverse sports. We also address the limitation of a constrained set of APPs, advocating for the development of a more accessible and enriched APP system for robust anti-doping practices.
Collapse
Affiliation(s)
- Hyunji Ryoo
- Department of Physical Education, Yonsei University Graduate School, Seoul, Republic of Korea
| | - Samuel Cho
- Independent Researcher, Seoul, Republic of Korea
| | - Taehan Oh
- Department of Physical Education, Yonsei University Graduate School, Seoul, Republic of Korea
| | - YuSik Kim
- Severance Institute for Vascular and Metabolic Research, Yonsei University College of Medicine, Seoul, Republic of Korea
| | - Sang-Hoon Suh
- Department of Physical Education, College of Educational Sciences, Yonsei University, Seoul, Republic of Korea
| |
Collapse
|
3
|
Nankya M, Chataut R, Akl R. Securing Industrial Control Systems: Components, Cyber Threats, and Machine Learning-Driven Defense Strategies. SENSORS (BASEL, SWITZERLAND) 2023; 23:8840. [PMID: 37960539 PMCID: PMC10649322 DOI: 10.3390/s23218840] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/19/2023] [Revised: 10/27/2023] [Accepted: 10/27/2023] [Indexed: 11/15/2023]
Abstract
Industrial Control Systems (ICS), which include Supervisory Control and Data Acquisition (SCADA) systems, Distributed Control Systems (DCS), and Programmable Logic Controllers (PLC), play a crucial role in managing and regulating industrial processes. However, ensuring the security of these systems is of utmost importance due to the potentially severe consequences of cyber attacks. This article presents an overview of ICS security, covering its components, protocols, industrial applications, and performance aspects. It also highlights the typical threats and vulnerabilities faced by these systems. Moreover, the article identifies key factors that influence the design decisions concerning control, communication, reliability, and redundancy properties of ICS, as these are critical in determining the security needs of the system. The article outlines existing security countermeasures, including network segmentation, access control, patch management, and security monitoring. Furthermore, the article explores the integration of machine learning techniques to enhance the cybersecurity of ICS. Machine learning offers several advantages, such as anomaly detection, threat intelligence analysis, and predictive maintenance. However, combining machine learning with other security measures is essential to establish a comprehensive defense strategy for ICS. The article also addresses the challenges associated with existing measures and provides recommendations for improving ICS security. This paper becomes a valuable reference for researchers aiming to make meaningful contributions within the constantly evolving ICS domain by providing an in-depth examination of the present state, challenges, and potential future advancements.
Collapse
Affiliation(s)
- Mary Nankya
- Computer Science Department, Fitchburg State University, Fitchburg, MA 01420, USA
| | - Robin Chataut
- School of Computing and Engineering, Quinnipiac University, Hamden, CT 06514, USA;
| | - Robert Akl
- Department of Computer Science and Engineering, University of North Texas, Denton, TX 76203, USA;
| |
Collapse
|
4
|
Guan J, Yao L, Chung CR, Chiang YC, Lee TY. StackTHPred: Identifying Tumor-Homing Peptides through GBDT-Based Feature Selection with Stacking Ensemble Architecture. Int J Mol Sci 2023; 24:10348. [PMID: 37373494 DOI: 10.3390/ijms241210348] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2023] [Revised: 05/31/2023] [Accepted: 06/02/2023] [Indexed: 06/29/2023] Open
Abstract
One of the major challenges in cancer therapy lies in the limited targeting specificity exhibited by existing anti-cancer drugs. Tumor-homing peptides (THPs) have emerged as a promising solution to this issue, due to their capability to specifically bind to and accumulate in tumor tissues while minimally impacting healthy tissues. THPs are short oligopeptides that offer a superior biological safety profile, with minimal antigenicity, and faster incorporation rates into target cells/tissues. However, identifying THPs experimentally, using methods such as phage display or in vivo screening, is a complex, time-consuming task, hence the need for computational methods. In this study, we proposed StackTHPred, a novel machine learning-based framework that predicts THPs using optimal features and a stacking architecture. With an effective feature selection algorithm and three tree-based machine learning algorithms, StackTHPred has demonstrated advanced performance, surpassing existing THP prediction methods. It achieved an accuracy of 0.915 and a 0.831 Matthews Correlation Coefficient (MCC) score on the main dataset, and an accuracy of 0.883 and a 0.767 MCC score on the small dataset. StackTHPred also offers favorable interpretability, enabling researchers to better understand the intrinsic characteristics of THPs. Overall, StackTHPred is beneficial for both the exploration and identification of THPs and facilitates the development of innovative cancer therapies.
Collapse
Affiliation(s)
- Jiahui Guan
- School of Medicine, The Chinese University of Hong Kong (Shenzhen) 2001 Longxiang Road, Shenzhen 518172, China
| | - Lantian Yao
- Kobilka Institute of Innovative Drug Discovery, School of Medicine, The Chinese University of Hong Kong (Shenzhen), 2001 Longxiang Road, Shenzhen 518172, China
- School of Science and Engineering, The Chinese University of Hong Kong (Shenzhen), 2001 Longxiang Road, Shenzhen 518172, China
| | - Chia-Ru Chung
- Kobilka Institute of Innovative Drug Discovery, School of Medicine, The Chinese University of Hong Kong (Shenzhen), 2001 Longxiang Road, Shenzhen 518172, China
| | - Ying-Chih Chiang
- School of Medicine, The Chinese University of Hong Kong (Shenzhen) 2001 Longxiang Road, Shenzhen 518172, China
- Kobilka Institute of Innovative Drug Discovery, School of Medicine, The Chinese University of Hong Kong (Shenzhen), 2001 Longxiang Road, Shenzhen 518172, China
| | - Tzong-Yi Lee
- Institute of Bioinformatics and Systems Biology, National Yang Ming Chiao Tung University, Hsinchu 300, Taiwan
| |
Collapse
|
5
|
Choo YJ, Chang MC. Use of machine learning in the field of prosthetics and orthotics: A systematic narrative review. Prosthet Orthot Int 2023; 47:226-240. [PMID: 36811961 DOI: 10.1097/pxr.0000000000000199] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/03/2021] [Accepted: 09/08/2022] [Indexed: 02/24/2023]
Abstract
Although machine learning is not yet being used in clinical practice within the fields of prosthetics and orthotics, several studies on the use of prosthetics and orthotics have been conducted. We intend to provide relevant knowledge by conducting a systematic review of prior studies on using machine learning in the fields of prosthetics and orthotics. We searched the Medical Literature Analysis and Retrieval System Online (MEDLINE), Cochrane, Embase, and Scopus databases and retrieved studies published until July 18, 2021. The study included the application of machine learning algorithms to upper-limb and lower-limb prostheses and orthoses. The criteria of the Quality in Prognosis Studies tool were used to assess the methodological quality of the studies. A total of 13 studies were included in this systematic review. In the realm of prostheses, machine learning has been used to identify prosthesis, select an appropriate prosthesis, train after wearing the prosthesis, detect falls, and manage the temperature in the socket. In the field of orthotics, machine learning was used to control real-time movement while wearing an orthosis and predict the need for an orthosis. The studies included in this systematic review are limited to the algorithm development stage. However, if the developed algorithms are actually applied to clinical practice, it is expected that it will be useful for medical staff and users to handle prosthesis and orthosis.
Collapse
Affiliation(s)
- Yoo Jin Choo
- Production R&D Division Advanced Interdisciplinary Team, Medical Device Development Center, Daegu-Gyeongbuk Medical Innovation Foundation, Deagu, South Korea
| | - Min Cheol Chang
- Department of Rehabilitation Medicine, College of Medicine, Yeungnam University, Daegu, South Korea
| |
Collapse
|
6
|
Chiu CC, Wu CM, Chien TN, Kao LJ, Li C, Chu CM. Integrating Structured and Unstructured EHR Data for Predicting Mortality by Machine Learning and Latent Dirichlet Allocation Method. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2023; 20:4340. [PMID: 36901354 PMCID: PMC10001457 DOI: 10.3390/ijerph20054340] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/16/2023] [Revised: 02/22/2023] [Accepted: 02/24/2023] [Indexed: 06/18/2023]
Abstract
An ICU is a critical care unit that provides advanced medical support and continuous monitoring for patients with severe illnesses or injuries. Predicting the mortality rate of ICU patients can not only improve patient outcomes, but also optimize resource allocation. Many studies have attempted to create scoring systems and models that predict the mortality of ICU patients using large amounts of structured clinical data. However, unstructured clinical data recorded during patient admission, such as notes made by physicians, is often overlooked. This study used the MIMIC-III database to predict mortality in ICU patients. In the first part of the study, only eight structured variables were used, including the six basic vital signs, the GCS, and the patient's age at admission. In the second part, unstructured predictor variables were extracted from the initial diagnosis made by physicians when the patients were admitted to the hospital and analyzed using Latent Dirichlet Allocation techniques. The structured and unstructured data were combined using machine learning methods to create a mortality risk prediction model for ICU patients. The results showed that combining structured and unstructured data improved the accuracy of the prediction of clinical outcomes in ICU patients over time. The model achieved an AUROC of 0.88, indicating accurate prediction of patient vital status. Additionally, the model was able to predict patient clinical outcomes over time, successfully identifying important variables. This study demonstrated that a small number of easily collectible structured variables, combined with unstructured data and analyzed using LDA topic modeling, can significantly improve the predictive performance of a mortality risk prediction model for ICU patients. These results suggest that initial clinical observations and diagnoses of ICU patients contain valuable information that can aid ICU medical and nursing staff in making important clinical decisions.
Collapse
Affiliation(s)
- Chih-Chou Chiu
- Department of Business Management, National Taipei University of Technology, Taipei 106, Taiwan
| | - Chung-Min Wu
- Department of Business Management, National Taipei University of Technology, Taipei 106, Taiwan
| | - Te-Nien Chien
- College of Management, National Taipei University of Technology, Taipei 106, Taiwan
| | - Ling-Jing Kao
- Department of Business Management, National Taipei University of Technology, Taipei 106, Taiwan
| | - Chengcheng Li
- College of Management, National Taipei University of Technology, Taipei 106, Taiwan
| | - Chuan-Mei Chu
- College of Management, National Taipei University of Technology, Taipei 106, Taiwan
| |
Collapse
|
7
|
John L, Mahanta HJ, Soujanya Y, Sastry GN. Assessing machine learning approaches for predicting failures of investigational drug candidates during clinical trials. Comput Biol Med 2023; 153:106494. [PMID: 36587568 DOI: 10.1016/j.compbiomed.2022.106494] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2022] [Revised: 11/30/2022] [Accepted: 12/27/2022] [Indexed: 12/30/2022]
Abstract
One of the major challenges in drug development is having acceptable levels of efficacy and safety throughout all the phases of clinical trials followed by the successful launch in the market. While there are many factors such as molecular properties, toxicity parameters, mechanism of action at the target site, etc. that regulates the therapeutic action of a compound, a holistic approach directed towards data-driven studies will invariably strengthen the predictive toxicological sciences. Our quest for the current study is to find out various reasons as to why an investigational candidate would fail in the clinical trials after multiple iterations of refinement and optimization. We have compiled a dataset that comprises of approved and withdrawn drugs as well as toxic compounds and essentially have used time-split based approach to generate the training and validation set. Five highly robust and scalable machine learning binary classifiers were used to develop the predictive models that were trained with features like molecular descriptors and fingerprints and then validated rigorously to achieve acceptable performance in terms of a set of performance metrics. The mean AUC scores for all the five classifiers with the hold-out test set were obtained in the range of 0.66-0.71. The models were further used to predict the probability score for the clinical candidate dataset. The top compounds predicted to be toxic were analyzed to estimate different dimensions of toxicity. Apparently, through this study, we propose that with the appropriate use of feature extraction and machine learning methods, one can estimate the likelihood of success or failure of investigational drugs candidates thereby opening an avenue for future trends in computational toxicological studies. The models developed in the study can be accessed at https://github.com/gnsastry/predicting_clinical_trials.git.
Collapse
Affiliation(s)
- Lijo John
- Advanced Computation and Data Sciences Division, CSIR- North East Institute of Science and Technology, Jorhat, 785006, Assam, India; Polymers and Functional Materials Division, CSIR-Indian Institute of Chemical Technology, Hyderabad, 500007, India; Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, 201002, Uttar Pradesh, India
| | - Hridoy Jyoti Mahanta
- Advanced Computation and Data Sciences Division, CSIR- North East Institute of Science and Technology, Jorhat, 785006, Assam, India; Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, 201002, Uttar Pradesh, India
| | - Y Soujanya
- Polymers and Functional Materials Division, CSIR-Indian Institute of Chemical Technology, Hyderabad, 500007, India; Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, 201002, Uttar Pradesh, India
| | - G Narahari Sastry
- Advanced Computation and Data Sciences Division, CSIR- North East Institute of Science and Technology, Jorhat, 785006, Assam, India; Polymers and Functional Materials Division, CSIR-Indian Institute of Chemical Technology, Hyderabad, 500007, India; Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, 201002, Uttar Pradesh, India.
| |
Collapse
|
8
|
Sahani N, Zhu R, Cho JH, Liu CC. Machine Learning-based Intrusion Detection for Smart Grid Computing: A Survey. ACM TRANSACTIONS ON CYBER-PHYSICAL SYSTEMS 2023. [DOI: 10.1145/3578366] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]
Abstract
Machine learning (ML)-based intrusion detection system (IDS) approaches have been significantly applied and advanced the state-of-the-art system security and defense mechanisms. In smart grid computing environments, security threats have been significantly increased as shared networks are commonly used, along with the associated vulnerabilities. However, compared to other network environments, ML-based IDS research in a smart grid is relatively unexplored although the smart grid environment is facing serious security threats due to its unique environmental vulnerabilities. In this paper, we conducted an extensive survey on ML-based IDS in smart grids based on the following key aspects: (1) The applications of the ML-based IDS in transmission and distribution side power components of a smart power grid by addressing its security vulnerabilities; (2) dataset generation process and its usage in applying ML-based IDSs in the smart grid; (3) a wide range of ML-based IDSs used by the surveyed papers in the smart grid environment; (4) metrics, complexity analysis, and evaluation testbeds of the IDSs applied in the smart grid; and (5) lessons learned, insights, and future research directions.
Collapse
|
9
|
N S, K V. Detection of Intrusion behavior in cloud applications using Pearson's chi-squared distribution and decision tree classifiers. Pattern Recognit Lett 2022. [DOI: 10.1016/j.patrec.2022.08.008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/15/2022]
|
10
|
Zhang Y, Hu Y, Gao X, Gong D, Guo Y, Gao K, Zhang W. An embedded vertical‐federated feature selection algorithm based on particle swarm optimisation. CAAI TRANSACTIONS ON INTELLIGENCE TECHNOLOGY 2022. [DOI: 10.1049/cit2.12122] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Affiliation(s)
- Yong Zhang
- School of Information and Control Engineering China University of Mining and Technology Xuzhou China
- The Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education Jilin University Changchun China
| | - Ying Hu
- School of Information and Control Engineering China University of Mining and Technology Xuzhou China
| | - Xiaozhi Gao
- School of Computing University of Eastern Finland Kuopio Finland
| | - Dunwei Gong
- School of Information and Control Engineering China University of Mining and Technology Xuzhou China
| | - Yinan Guo
- School of Information and Control Engineering China University of Mining and Technology Xuzhou China
| | - Kaizhou Gao
- The Macau Institute of Systems Engineering Macau University of Science and Technology Taipa China
| | - Wanqiu Zhang
- School of Information and Control Engineering China University of Mining and Technology Xuzhou China
| |
Collapse
|
11
|
Zhang C, Jia D, Wang L, Wang W, Liu F, Yang A. Comparative Research on Network Intrusion Detection Methods Based on Machine Learning. Comput Secur 2022. [DOI: 10.1016/j.cose.2022.102861] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
|
12
|
Rani D, Gill NS, Gulia P, Chatterjee JM. An Ensemble-Based Multiclass Classifier for Intrusion Detection Using Internet of Things. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2022; 2022:1668676. [PMID: 35634069 PMCID: PMC9142322 DOI: 10.1155/2022/1668676] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/25/2022] [Accepted: 04/26/2022] [Indexed: 11/18/2022]
Abstract
Internet of Things (IoT) is the fastest growing technology that has applications in various domains such as healthcare, transportation. It interconnects trillions of smart devices through the Internet. A secure network is the basic necessity of the Internet of Things. Due to the increasing rate of interconnected and remotely accessible smart devices, more and more cybersecurity issues are being witnessed among cyber-physical systems. A perfect intrusion detection system (IDS) can probably identify various cybersecurity issues and their sources. In this article, using various telemetry datasets of different Internet of Things scenarios, we exhibit that external users can access the IoT devices and infer the victim user's activity by sniffing the network traffic. Further, the article presents the performance of various bagging and boosting ensemble decision tree techniques of machine learning in the design of an efficient IDS. Most of the previous IDSs just focused on good accuracy and ignored the execution speed that must be improved to optimize the performance of an ID model. Most of the earlier pieces of research focused on binary classification. This study attempts to evaluate the performance of various ensemble machine learning multiclass classification algorithms by deploying on openly available "TON-IoT" datasets of IoT and Industrial IoT (IIoT) sensors.
Collapse
Affiliation(s)
- Deepti Rani
- Department of Computer Science & Applications, Maharshi Dayanand University, Rohtak, Haryana, India
| | - Nasib Singh Gill
- Department of Computer Science & Applications, Maharshi Dayanand University, Rohtak, Haryana, India
| | - Preeti Gulia
- Department of Computer Science & Applications, Maharshi Dayanand University, Rohtak, Haryana, India
| | - Jyotir Moy Chatterjee
- Department of Information Technology, Lord Buddha Education Foundation, Kathmandu, Nepal
| |
Collapse
|
13
|
A Classy Multifacet Clustering and Fused Optimization Based Classification Methodologies for SCADA Security. ENERGIES 2022. [DOI: 10.3390/en15103624] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Abstract
Detecting intrusions from the supervisory control and data acquisition (SCADA) systems is one of the most essential and challenging processes in recent times. Most of the conventional works aim to develop an efficient intrusion detection system (IDS) framework for increasing the security of SCADA against networking attacks. Nonetheless, it faces the problems of complexity in classification, requiring more time for training and testing, as well as increased misprediction results and error outputs. Hence, this research work intends to develop a novel IDS framework by implementing a combination of methodologies, such as clustering, optimization, and classification. The most popular and extensively utilized SCADA attacking datasets are taken for this system’s proposed IDS framework implementation and validation. The main contribution of this work is to accurately detect the intrusions from the given SCADA datasets with minimized computational operations and increased accuracy of classification. Additionally the proposed work aims to develop a simple and efficient classification technique for improving the security of SCADA systems. Initially, the dataset preprocessing and clustering processes were performed using the multifacet data clustering model (MDCM) in order to simplify the classification process. Then, the hybrid gradient descent spider monkey optimization (GDSMO) mechanism is implemented for selecting the optimal parameters from the clustered datasets, based on the global best solution. The main purpose of using the optimization methodology is to train the classifier with the optimized features to increase accuracy and reduce processing time. Moreover, the deep sequential long short term memory (DS-LSTM) is employed to identify the intrusions from the clustered datasets with efficient data model training. Finally, the proposed optimization-based classification methodology’s performance and results are validated and compared using various evaluation metrics.
Collapse
|
14
|
Revisiting Gradient Boosting-Based Approaches for Learning Imbalanced Data: A Case of Anomaly Detection on Power Grids. BIG DATA AND COGNITIVE COMPUTING 2022. [DOI: 10.3390/bdcc6020041] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Abstract
Gradient boosting ensembles have been used in the cyber-security area for many years; nonetheless, their efficacy and accuracy for intrusion detection systems (IDSs) remain questionable, particularly when dealing with problems involving imbalanced data. This article fills the void in the existing body of knowledge by evaluating the performance of gradient boosting-based ensembles, including gradient boosting machine (GBM), extreme gradient boosting (XGBoost), LightGBM, and CatBoost. This paper assesses the performance of various imbalanced data sets using the Matthew correlation coefficient (MCC), area under the receiver operating characteristic curve (AUC), and F1 metrics. The article discusses an example of anomaly detection in an industrial control network and, more specifically, threat detection in a cyber-physical smart power grid. The tests’ results indicate that CatBoost surpassed its competitors, regardless of the imbalance ratio of the data sets. Moreover, LightGBM showed a much lower performance value and had more variability across the data sets.
Collapse
|
15
|
Upadhyay D, Zaman M, Joshi R, Sampalli S. An Efficient Key Management and Multi-Layered Security Framework for SCADA Systems. IEEE TRANSACTIONS ON NETWORK AND SERVICE MANAGEMENT 2022. [DOI: 10.1109/tnsm.2021.3104531] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
16
|
A Review of Research Works on Supervised Learning Algorithms for SCADA Intrusion Detection and Classification. SUSTAINABILITY 2021. [DOI: 10.3390/su13179597] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Supervisory Control and Data Acquisition (SCADA) systems play a significant role in providing remote access, monitoring and control of critical infrastructures (CIs) which includes electrical power systems, water distribution systems, nuclear power plants, etc. The growing interconnectivity, standardization of communication protocols and remote accessibility of modern SCADA systems have contributed massively to the exposure of SCADA systems and CIs to various forms of security challenges. Any form of intrusive action on the SCADA modules and communication networks can create devastating consequences on nations due to their strategic importance to CIs’ operations. Therefore, the prompt and efficient detection and classification of SCADA systems intrusions hold great importance for national CIs operational stability. Due to their well-recognized and documented efficiencies, several literature works have proposed numerous supervised learning techniques for SCADA intrusion detection and classification (IDC). This paper presents a critical review of recent studies whereby supervised learning techniques were modelled for SCADA intrusion solutions. The paper aims to contribute to the state-of-the-art, recognize critical open issues and offer ideas for future studies. The intention is to provide a research-based resource for researchers working on industrial control systems security. The analysis and comparison of different supervised learning techniques for SCADA IDC systems were critically reviewed, in terms of the methodologies, datasets and testbeds used, feature engineering and optimization mechanisms and classification procedures. Finally, we briefly summarized some suggestions and recommendations for future research works.
Collapse
|