1
|
Muhsin ZJ, Qahwaji R, AlShawabkeh M, AlRyalat SA, Al Bdour M, Al-Taee M. Smart decision support system for keratoconus severity staging using corneal curvature and thinnest pachymetry indices. EYE AND VISION (LONDON, ENGLAND) 2024; 11:28. [PMID: 38978067 PMCID: PMC11229244 DOI: 10.1186/s40662-024-00394-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/15/2023] [Accepted: 06/17/2024] [Indexed: 07/10/2024]
Abstract
BACKGROUND This study proposes a decision support system created in collaboration with machine learning experts and ophthalmologists for detecting keratoconus (KC) severity. The system employs an ensemble machine model and minimal corneal measurements. METHODS A clinical dataset is initially obtained from Pentacam corneal tomography imaging devices, which undergoes pre-processing and addresses imbalanced sampling through the application of an oversampling technique for minority classes. Subsequently, a combination of statistical methods, visual analysis, and expert input is employed to identify Pentacam indices most correlated with severity class labels. These selected features are then utilized to develop and validate three distinct machine learning models. The model exhibiting the most effective classification performance is integrated into a real-world web-based application and deployed on a web application server. This deployment facilitates evaluation of the proposed system, incorporating new data and considering relevant human factors related to the user experience. RESULTS The performance of the developed system is experimentally evaluated, and the results revealed an overall accuracy of 98.62%, precision of 98.70%, recall of 98.62%, F1-score of 98.66%, and F2-score of 98.64%. The application's deployment also demonstrated precise and smooth end-to-end functionality. CONCLUSION The developed decision support system establishes a robust basis for subsequent assessment by ophthalmologists before potential deployment as a screening tool for keratoconus severity detection in a clinical setting.
Collapse
Affiliation(s)
- Zahra J Muhsin
- Department of Computer Science, University of Bradford, Bradford, BD7 1DP, UK.
| | - Rami Qahwaji
- Department of Computer Science, University of Bradford, Bradford, BD7 1DP, UK
| | | | | | - Muawyah Al Bdour
- School of Medicine, The University of Jordan, Amman, 11942, Jordan
| | - Majid Al-Taee
- Department of Computer Science, University of Bradford, Bradford, BD7 1DP, UK
| |
Collapse
|
2
|
Liu Y, Wang S, Sui H, Zhu L. An ensemble learning method with GAN-based sampling and consistency check for anomaly detection of imbalanced data streams with concept drift. PLoS One 2024; 19:e0292140. [PMID: 38277426 PMCID: PMC10817223 DOI: 10.1371/journal.pone.0292140] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2023] [Accepted: 09/13/2023] [Indexed: 01/28/2024] Open
Abstract
A challenge to many real-world data streams is imbalance with concept drift, which is one of the most critical tasks in anomaly detection. Learning nonstationary data streams for anomaly detection has been well studied in recent years. However, most of the researches assume that the class of data streams is relatively balanced. Only a few approaches tackle the joint issue of imbalance and concept drift. To overcome this joint issue, we propose an ensemble learning method with generative adversarial network-based sampling and consistency check (EGSCC) in this paper. First, we design a comprehensive anomaly detection framework that includes an oversampling module by generative adversarial network, an ensemble classifier, and a consistency check module. Next, we introduce double encoders into GAN to better capture the distribution characteristics of imbalanced data for oversampling. Then, we apply the stacking ensemble learning to deal with concept drift. Four base classifiers of SVM, KNN, DT and RF are used in the first layer, and LR is used as meta classifier in second layer. Last but not least, we take consistency check of the incremental instance and check set to determine whether it is anormal by statistical learning, instead of threshold-based method. And the validation set is dynamic updated according to the consistency check result. Finally, three artificial data sets obtained from Massive Online Analysis platform and two real data sets are used to verify the performance of the proposed method from four aspects: detection performance, parameter sensitivity, algorithm cost and anti-noise ability. Experimental results show that the proposed method has significant advantages in anomaly detection of imbalanced data streams with concept drift.
Collapse
Affiliation(s)
- Yansong Liu
- School of Software Engineering, Xi’an Jiao Tong University, Xi’an, Shaanxi, China
- School of Intelligent Engineering, Shandong Management University, Jinan, Shandong, China
| | - Shuang Wang
- Information Security Evaluation Center of Civil Aviation, Civil Aviation University of China, Tianjin, China
| | - He Sui
- College of Aeronautical Engineering, Civil Aviation University of China, Tianjin, China
| | - Li Zhu
- School of Software Engineering, Xi’an Jiao Tong University, Xi’an, Shaanxi, China
| |
Collapse
|
3
|
Zhang Y, Qu H, Tian Y, Na F, Yan J, Wu Y, Cui X, Li Z, Zhao M. PB-LNet: a model for predicting pathological subtypes of pulmonary nodules on CT images. BMC Cancer 2023; 23:936. [PMID: 37789252 PMCID: PMC10548640 DOI: 10.1186/s12885-023-11364-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2022] [Accepted: 09/04/2023] [Indexed: 10/05/2023] Open
Abstract
OBJECTIVE To investigate the correlation between CT imaging features and pathological subtypes of pulmonary nodules and construct a prediction model using deep learning. METHODS We collected information of patients with pulmonary nodules treated by surgery and the reference standard for diagnosis was post-operative pathology. After using elastic distortion for data augmentation, the CT images were divided into a training set, a validation set and a test set in a ratio of 6:2:2. We used PB-LNet to analyze the nodules in pre-operative CT and predict their pathological subtypes. Accuracy was used as the model evaluation index and Class Activation Map was applied to interpreting the results. Comparative experiments with other models were carried out to achieve the best results. Finally, images from the test set without data augmentation were analyzed to judge the clinical utility. RESULTS Four hundred seventy-seven patients were included and the nodules were divided into six groups: benign lesions, precursor glandular lesions, minimally invasive adenocarcinoma, invasive adenocarcinoma Grade 1, Grade 2 and Grade 3. The accuracy of the test set was 0.84. Class Activation Map confirmed that PB-LNet classified the nodules mainly based on the lungs in CT images, which is in line with the actual situation in clinical practice. In comparative experiments, PB-LNet obtained the highest accuracy. Finally, 96 images from the test set without data augmentation were analyzed and the accuracy was 0.89. CONCLUSIONS In classifying CT images of lung nodules into six categories based on pathological subtypes, PB-LNet demonstrates satisfactory accuracy without the need of delineating nodules, while the results are interpretable. A high level of accuracy was also obtained when validating on real data, therefore demonstrates its usefulness in clinical practice.
Collapse
Affiliation(s)
- Yuchong Zhang
- Department of Medical Oncology, the First Hospital of China Medical University, NO.155, North Nanjing Street, Heping District, Shenyang, Liaoning Province, 110001, China
| | - Hui Qu
- College of Medicine and Biological Information Engineering, Northeastern University, NO. 3-11, Wenhua Road, Heping District, Shenyang, 110819, Liaoning Province, China
| | - Yumeng Tian
- Department of Medical Oncology, the First Hospital of China Medical University, NO.155, North Nanjing Street, Heping District, Shenyang, Liaoning Province, 110001, China
| | - Fangjian Na
- Network Information Center, China Medical University, NO.77 Puhe Road, Shenbei New District, Shenyang, Liaoning Province, 110122, China
| | - Jinshan Yan
- Department of Medical Oncology, the First Hospital of China Medical University, NO.155, North Nanjing Street, Heping District, Shenyang, Liaoning Province, 110001, China
| | - Ying Wu
- Phase I Clinical Trails Center, the First Hospital of China Medical University, 210 1st Baita Street, Hunnan Distriction, Shenyang, Liaoning Province, 110101, China
| | - Xiaoyu Cui
- College of Medicine and Biological Information Engineering, Northeastern University, NO. 3-11, Wenhua Road, Heping District, Shenyang, 110819, Liaoning Province, China.
- Key Laboratory of Intelligent Computing in Medical Image, Ministry of Education, Shenyang, China.
| | - Zhi Li
- Department of Medical Oncology, the First Hospital of China Medical University, NO.155, North Nanjing Street, Heping District, Shenyang, Liaoning Province, 110001, China.
| | - Mingfang Zhao
- Department of Medical Oncology, the First Hospital of China Medical University, NO.155, North Nanjing Street, Heping District, Shenyang, Liaoning Province, 110001, China.
| |
Collapse
|
4
|
Green DH, Langham AW, Agustin RA, Quinn DW, Leeb SB. Adaptation for Automated Drift Detection in Electromechanical Machine Monitoring. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023; 34:6768-6782. [PMID: 35737609 DOI: 10.1109/tnnls.2022.3184011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Practical machine learning applications for streaming data can involve concept drift (the change in statistical properties of data over time), one-shot or few-shot learning (starting with only one or a few examples for each class), a scarcity of representative training data, and extreme verification latency (only the initial dataset has ground-truth labels). This work presents a framework for organizing signal processing and machine learning techniques to provide adaptive classification and drift detection. Nonintrusive load monitoring (NILM) serves as an ideal case study, as modern sensing solutions provide a wellspring of electromechanical data sources. There is a lack of training datasets that generalize across different load and fault scenarios. Accordingly, training must be accomplished with a limited set of data when deploying a NILM to a new power system. Also, loads can exhibit concept drift over time either due to faults or normal variation. NILM field data is used as an illustrative case study to demonstrate the proposed framework for adaptation and drift tracking.
Collapse
|
5
|
Liu Z, Zhang Y, Ding Z, He X. An Online Active Broad Learning Approach for Real-Time Safety Assessment of Dynamic Systems in Nonstationary Environments. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023; 34:6714-6724. [PMID: 36417729 DOI: 10.1109/tnnls.2022.3222265] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
Real-time safety assessment of the complex dynamic systems in nonstationary environments is of great significance for avoiding the potential hazards. In this case, the update procedure with high assessment accuracy and training speed is crucial and meaningful in the dynamic streaming setting. Generally, the performance of most online learning approaches will be negatively affected by limited annotated samples in such a setting. Moreover, the time cost of advanced conventional methods with retaining procedures is relatively high, constraining their practicality. In this article, a novel online active broad learning approach, termed OABL, is proposed. In detail, the effectiveness of the broad learning system in the framework of online active learning is first revealed and verified. A reasonable dynamic asymmetric query strategy is then designed with a limited annotation budget to actively annotate the relatively valuable samples, which is beneficial to mitigating the negative effects of class imbalance. In this context, the advantage of the human-in-the-loop characteristic is also effectively used to control the evolution direction of the learner during the incremental update, which makes it better able to adapt to complex and nonstationary environments. Several related experiments are conducted with the realistic data of JiaoLong deep-sea manned submersible. Results show the effectiveness and practicality of the proposal compared with the existing advanced approaches.
Collapse
|
6
|
Shyaa MA, Zainol Z, Abdullah R, Anbar M, Alzubaidi L, Santamaría J. Enhanced Intrusion Detection with Data Stream Classification and Concept Drift Guided by the Incremental Learning Genetic Programming Combiner. SENSORS (BASEL, SWITZERLAND) 2023; 23:3736. [PMID: 37050795 PMCID: PMC10098915 DOI: 10.3390/s23073736] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/11/2023] [Revised: 03/27/2023] [Accepted: 03/31/2023] [Indexed: 06/19/2023]
Abstract
Concept drift (CD) in data streaming scenarios such as networking intrusion detection systems (IDS) refers to the change in the statistical distribution of the data over time. There are five principal variants related to CD: incremental, gradual, recurrent, sudden, and blip. Genetic programming combiner (GPC) classification is an effective core candidate for data stream classification for IDS. However, its basic structure relies on the usage of traditional static machine learning models that receive onetime training, limiting its ability to handle CD. To address this issue, we propose an extended variant of the GPC using three main components. First, we replace existing classifiers with alternatives: online sequential extreme learning machine (OSELM), feature adaptive OSELM (FA-OSELM), and knowledge preservation OSELM (KP-OSELM). Second, we add two new components to the GPC, specifically, a data balancing and a classifier update. Third, the coordination between the sub-models produces three novel variants of the GPC: GPC-KOS for KA-OSELM; GPC-FOS for FA-OSELM; and GPC-OS for OSELM. This article presents the first data stream-based classification framework that provides novel strategies for handling CD variants. The experimental results demonstrate that both GPC-KOS and GPC-FOS outperform the traditional GPC and other state-of-the-art methods, and the transfer learning and memory features contribute to the effective handling of most types of CD. Moreover, the application of our incremental variants on real-world datasets (KDD Cup '99, CICIDS-2017, CSE-CIC-IDS-2018, and ISCX '12) demonstrate improved performance (GPC-FOS in connection with CSE-CIC-IDS-2018 and CICIDS-2017; GPC-KOS in connection with ISCX2012 and KDD Cup '99), with maximum accuracy rates of 100% and 98% by GPC-KOS and GPC-FOS, respectively. Additionally, our GPC variants do not show superior performance in handling blip drift.
Collapse
Affiliation(s)
- Methaq A. Shyaa
- School of Computer Sciences, Universiti Sains Malaysia, USM, Gelugor 11800, Pulau Penang, Malaysia; (M.A.S.)
| | - Zurinahni Zainol
- School of Computer Sciences, Universiti Sains Malaysia, USM, Gelugor 11800, Pulau Penang, Malaysia; (M.A.S.)
| | - Rosni Abdullah
- School of Computer Sciences, Universiti Sains Malaysia, USM, Gelugor 11800, Pulau Penang, Malaysia; (M.A.S.)
| | - Mohammed Anbar
- National Advanced IPv6 Centre (NAv6), Universiti Sains Malaysia, USM, Gelugor 11800, Pulau Penang, Malaysia
| | - Laith Alzubaidi
- School of Mechanical, Medical, and Process Engineering, Queensland University of Technology, Brisbane, QLD 4000, Australia
- Centre for Data Science, Queensland University of Technology, Brisbane, QLD 4000, Australia
| | - José Santamaría
- Department of Computer Science, University of Jaén, 23071 Jaén, Spain
| |
Collapse
|
7
|
Peng X, Wang FY, Li L. MixGradient: A gradient-based re-weighting scheme with mixup for imbalanced data streams. Neural Netw 2023; 161:525-534. [PMID: 36805267 DOI: 10.1016/j.neunet.2023.02.017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2022] [Revised: 11/22/2022] [Accepted: 02/09/2023] [Indexed: 02/15/2023]
Abstract
A challenge for contemporary deep neural networks in real-world problems is learning from an imbalanced data stream, where data tends to be received chunk by chunk over time, and the prior class distribution is severely imbalanced. Although many sophisticated algorithms have been derived, most of them overlook the importance of gradient information. From this perspective, the difficulty of learning from imbalanced data streams lies in the fact that the gradient estimated on an uneven class distribution is not informative enough to reflect the critical pattern of each class. To this end, we propose to assign higher weights on the training samples whose gradients are close to the gradient of corresponding typical samples, thus highlighting the important samples in minority classes and suppressing the noisy samples in majority classes. Such an idea can be combined with Mixup, which exploits the interpolation information of data to further compensate for the information of sample space that the typical samples do not provide and expand the role of the proposed re-weighting scheme. Experiments on artificially induced long-tailed CIFAR data streams and long-tailed MiniPlaces data stream show that the resulting method, termed MixGradient, boosts the generalization performance of DNNs under different imbalance ratios and achieves up to 10% accuracy improvement.
Collapse
Affiliation(s)
- Xinyu Peng
- Department of Automation, Tsinghua University, Beijing, 100084, China.
| | - Fei-Yue Wang
- State Key Laboratory of Management and Control for Complex Systems, Institute of Automation, Chinese Academy of Sciences, Beijing, 100080, China.
| | - Li Li
- Department of Automation, Tsinghua University, Beijing, 100084, China; National Laboratory for Information Science and Technology, Tsinghua University, Beijing, 100084, China.
| |
Collapse
|
8
|
Elreedy D, Atiya AF, Kamalov F. A theoretical distribution analysis of synthetic minority oversampling technique (SMOTE) for imbalanced learning. Mach Learn 2023. [DOI: 10.1007/s10994-022-06296-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]
Abstract
AbstractClass imbalance occurs when the class distribution is not equal. Namely, one class is under-represented (minority class), and the other class has significantly more samples in the data (majority class). The class imbalance problem is prevalent in many real world applications. Generally, the under-represented minority class is the class of interest. The synthetic minority over-sampling technique (SMOTE) method is considered the most prominent method for handling unbalanced data. The SMOTE method generates new synthetic data patterns by performing linear interpolation between minority class samples and their K nearest neighbors. However, the SMOTE generated patterns do not necessarily conform to the original minority class distribution. This paper develops a novel theoretical analysis of the SMOTE method by deriving the probability distribution of the SMOTE generated samples. To the best of our knowledge, this is the first work deriving a mathematical formulation for the SMOTE patterns’ probability distribution. This allows us to compare the density of the generated samples with the true underlying class-conditional density, in order to assess how representative the generated samples are. The derived formula is verified by computing it on a number of densities versus densities computed and estimated empirically.
Collapse
|
9
|
Malialis K, Panayiotou CG, Polycarpou MM. Nonstationary data stream classification with online active learning and siamese neural networks✩. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2022.09.065] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/31/2022]
|
10
|
Takada T, Kitajima T. Trend-following with better adaptation to large downside risks. PLoS One 2022; 17:e0276322. [PMID: 36256670 PMCID: PMC9578607 DOI: 10.1371/journal.pone.0276322] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2022] [Accepted: 10/04/2022] [Indexed: 11/18/2022] Open
Abstract
Avoiding losses from long-term trend reversals is challenging, and trend-following is one of the few trading approaches to explore it. While trend-following is popular among investors and has gained increased attention in academia, the recent diminished profitability in equity markets casts doubt on its effectiveness. To clarify its cause and suggest remedies, we thoroughly examine the effect of market conditions and averaging window on recent profitability using four major stock indices in an out-of-sample experiment comparing trend-following rules (moving average and momentum) and a machine-classification-based non-trend-following rule. In addition to the significant advantage of trend-following rules in avoiding downside risks, we find a discrepancy in the optimum averaging window size between trend direction phases, which is exacerbated by a higher positive trend direction ratio. A higher positive trend direction ratio leads to poor performance relative to buy-and-hold returns. This discrepancy creates the dilemma of choosing which trend direction phase to emphasize. Incorporating machine-learning into trend-following is effective for alleviating this dilemma. We find that the profit-maximizing averaging window realizes the level that best balances the dilemma and suggest a simple guideline for selecting the optimum averaging window. We attribute the sluggishness of trend-following in recent equity markets to the insufficient chances of trend reversals rather than their loss of profitability. Our results contribute to improving the performance of trend following by mitigating the dilemma.
Collapse
Affiliation(s)
- Teruko Takada
- Graduate School of Business, Osaka Metropolitan University, Osaka, Japan
| | - Takahiro Kitajima
- Graduate School of Business, Osaka Metropolitan University, Osaka, Japan
- Faculty of Commerce, Kumamoto Gakuen University, Kumamoto, Japan
| |
Collapse
|
11
|
Song Y, Lu J, Liu A, Lu H, Zhang G. A Segment-Based Drift Adaptation Method for Data Streams. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2022; 33:4876-4889. [PMID: 33835922 DOI: 10.1109/tnnls.2021.3062062] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
In concept drift adaptation, we aim to design a blind or an informed strategy to update our best predictor for future data at each time point. However, existing informed drift adaptation methods need to wait for an entire batch of data to detect drift and then update the predictor (if drift is detected), which causes adaptation delay. To overcome the adaptation delay, we propose a sequentially updated statistic, called drift-gradient to quantify the increase of distributional discrepancy when every new instance arrives. Based on drift-gradient, a segment-based drift adaptation (SEGA) method is developed to online update our best predictor. Drift-gradient is defined on a segment in the training set. It can precisely quantify the increase of distributional discrepancy between the old segment and the newest segment when only one new instance is available at each time point. A lower value of drift-gradient on the old segment represents that the distribution of the new instance is closer to the distribution of the old segment. Based on the drift-gradient, SEGA retrains our best predictors with the segments that have the minimum drift-gradient when every new instance arrives. SEGA has been validated by extensive experiments on both synthetic and real-world, classification and regression data streams. The experimental results show that SEGA outperforms competitive blind and informed drift adaptation methods.
Collapse
|
12
|
Pan T, Pedrycz W, Yang J, Wu W, Zhang Y. A new classifier for imbalanced data with iterative learning process and ensemble operating process. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2022.108966] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
13
|
Fadel MM, El-Ghamrawy SM, Ali-Eldin AMT, Hassan MK, El-Desoky AI. The proposed hybrid deep learning intrusion prediction IoT (HDLIP-IoT) framework. PLoS One 2022; 17:e0271436. [PMID: 35905101 PMCID: PMC9337696 DOI: 10.1371/journal.pone.0271436] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2022] [Accepted: 06/30/2022] [Indexed: 11/18/2022] Open
Abstract
Throughout the past few years, the Internet of Things (IoT) has grown in popularity because of its ease of use and flexibility. Cyber criminals are interested in IoT because it offers a variety of benefits for users, but it still poses many types of threats. The most common form of attack against IoT is Distributed Denial of Service (DDoS). The growth of preventive processes against DDoS attacks has prompted IoT professionals and security experts to focus on this topic. Due to the increasing prevalence of DDoS attacks, some methods for distinguishing different types of DDoS attacks based on individual network features have become hard to implement. Additionally, monitoring traffic pattern changes and detecting DDoS attacks with accuracy are urgent and necessary. In this paper, using Modified Whale Optimization Algorithm (MWOA) feature extraction and Hybrid Long Short Term Memory (LSTM), shown that DDoS attack detection methods can be developed and tested on various datasets. The MWOA technique, which is used to optimize the weights of the LSTM neural network to reduce prediction errors in the hybrid LSTM algorithm, is used. Additionally, MWOA can optimally extract IP packet features and identify DDoS attacks with the support of MWOA-LSTM model. The proposed MWOA-LSTM framework outperforms standard support vector machines (SVM) and Genetic Algorithm (GA) as well as standard methods for detecting attacks based on precision, recall and accuracy measurements.
Collapse
Affiliation(s)
- Magdy M. Fadel
- Computer Engineering and Systems Department, Faculty of Engineering, Mansoura University, Mansoura, Dakahlia, Egypt
- * E-mail:
| | - Sally M. El-Ghamrawy
- Head of Communications and Computer Engineering Department, MISR Higher Institute for Engineering and Technology, Mansoura, Dakahlia, Egypt
| | - Amr M. T. Ali-Eldin
- Computer Engineering and Systems Department, Faculty of Engineering, Mansoura University, Mansoura, Dakahlia, Egypt
| | - Mohammed K. Hassan
- Mechatronics Department, Faculty of Engineering, Horus University in Egypt (HUE), New Damietta, Damietta, Egypt
| | - Ali I. El-Desoky
- Computer Engineering and Systems Department, Faculty of Engineering, Mansoura University, Mansoura, Dakahlia, Egypt
| |
Collapse
|
14
|
Wang P, Jin N, Woo WL, Woodward JR, Davies D. Noise tolerant drift detection method for Data Stream Mining. Inf Sci (N Y) 2022. [DOI: 10.1016/j.ins.2022.07.065] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
|
15
|
DFAID: Density‐aware and feature‐deviated active intrusion detection over network traffic streams. Comput Secur 2022. [DOI: 10.1016/j.cose.2022.102719] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
|
16
|
Li Y, Zhang J, Zhang S, Xiao W, Zhang Z. Multi-objective optimization-based adaptive class-specific cost extreme learning machine for imbalanced classification. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2022.05.008] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
17
|
Jiao B, Guo Y, Gong D, Chen Q. Dynamic Ensemble Selection for Imbalanced Data Streams With Concept Drift. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2022; PP:1278-1291. [PMID: 35731763 DOI: 10.1109/tnnls.2022.3183120] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Ensemble learning, as a popular method to tackle concept drift in data stream, forms a combination of base classifiers according to their global performances. However, concept drift generally occurs in local data space, causing significantly different performances of a base classifier at different locations. Thus, employing global performance as a criterion to select base classifier is inappropriate. Moreover, data stream is often accompanied by class imbalance problem, which affects the classification accuracy of ensemble learning on minority instances. To drawback these problems, a dynamic ensemble selection for imbalanced data streams with concept drift (DES-ICD) is proposed. For data arrived in chunk-by-chunk, a novel synthetic minority oversampling technique with adaptive nearest neighbors (AnnSMOTE) is developed to generate new minority instances that conform to the new concept. Following that, DES-ICD creates a base classifier on newly arrived data chunk balanced by AnnSMOTE and merges it with historical base classifiers to form a candidate classifier pool. For each query instance, the optimal combination is constructed in terms of the performance of candidate classifiers in its neighborhood. Experimental results for nine synthetic and five real-world datasets show that the proposed method outperforms seven comparative methods on classification accuracy and tracks new concepts in an imbalanced data stream more preciously.
Collapse
|
18
|
Bayram F, Ahmed BS, Kassler A. From concept drift to model degradation: An overview on performance-aware drift detectors. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2022.108632] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
19
|
Information resources estimation for accurate distribution-based concept drift detection. Inf Process Manag 2022. [DOI: 10.1016/j.ipm.2022.102911] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
|
20
|
Liu T, Chen S, Liang S, Gan S, Harris CJ. Multi-Output Selective Ensemble Identification of Nonlinear and Nonstationary Industrial Processes. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2022; 33:1867-1880. [PMID: 33052869 DOI: 10.1109/tnnls.2020.3027701] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
A key characteristic of biological systems is the ability to update the memory by learning new knowledge and removing out-of-date knowledge so that intelligent decision can be made based on the relevant knowledge acquired in the memory. Inspired by this fundamental biological principle, this article proposes a multi-output selective ensemble regression (SER) for online identification of multi-output nonlinear time-varying industrial processes. Specifically, an adaptive local learning approach is developed to automatically identify and encode a newly emerging process state by fitting a local multi-output linear model based on the multi-output hypothesis testing. This growth strategy ensures a highly diverse and independent local model set. The online modeling is constructed as a multi-output SER predictor by optimizing the combining weights of the selected local multi-output models based on a probability metric. An effective pruning strategy is also developed to remove the unwanted out-of-date local multi-output linear models in order to achieve low online computational complexity without scarifying the prediction accuracy. A simulated two-output process and two real-world identification problems are used to demonstrate the effectiveness of the proposed multi-output SER over a range of benchmark schemes for real-time identification of multi-output nonlinear and nonstationary processes, in terms of both online identification accuracy and computational complexity.
Collapse
|
21
|
ROSE: robust online self-adjusting ensemble for continual learning on imbalanced drifting data streams. Mach Learn 2022. [DOI: 10.1007/s10994-022-06168-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
22
|
Ahmad R, Alsmadi I, Alhamdani W, Tawalbeh L. A comprehensive deep learning benchmark for IoT IDS. Comput Secur 2022. [DOI: 10.1016/j.cose.2021.102588] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022]
|
23
|
Zhang Y, Lam S, Yu T, Teng X, Zhang J, Lee FKH, Au KH, Yip CWY, Wang S, Cai J. Integration of an imbalance framework with novel high-generalizable classifiers for radiomics-based distant metastases prediction of advanced nasopharyngeal carcinoma. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2021.107649] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/09/2022]
|
24
|
Models versus Datasets: Reducing Bias through Building a Comprehensive IDS Benchmark. FUTURE INTERNET 2021. [DOI: 10.3390/fi13120318] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
Today, deep learning approaches are widely used to build Intrusion Detection Systems for securing IoT environments. However, the models’ hidden and complex nature raises various concerns, such as trusting the model output and understanding why the model made certain decisions. Researchers generally publish their proposed model’s settings and performance results based on a specific dataset and a classification model but do not report the proposed model’s output and findings. Similarly, many researchers suggest an IDS solution by focusing only on a single benchmark dataset and classifier. Such solutions are prone to generating inaccurate and biased results. This paper overcomes these limitations in previous work by analyzing various benchmark datasets and various individual and hybrid deep learning classifiers towards finding the best IDS solution for IoT that is efficient, lightweight, and comprehensive in detecting network anomalies. We also showed the model’s localized predictions and analyzed the top contributing features impacting the global performance of deep learning models. This paper aims to extract the aggregate knowledge from various datasets and classifiers and analyze the commonalities to avoid any possible bias in results and increase the trust and transparency of deep learning models. We believe this paper’s findings will help future researchers build a comprehensive IDS based on well-performing classifiers and utilize the aggregated knowledge and the minimum set of significantly contributing features.
Collapse
|
25
|
Gärtler M, Khaydarov V, Klöpper B, Urbas L. The Machine Learning Life Cycle in Chemical Operations – Status and Open Challenges. CHEM-ING-TECH 2021. [DOI: 10.1002/cite.202100134] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
Affiliation(s)
- Marco Gärtler
- ABB Corporate Research Center Wallstadter Straße 59 68526 Ladenburg Germany
| | - Valentin Khaydarov
- Technische Universität Dresden Professur für Prozessleittechnik 01062 Dresden Germany
| | - Benjamin Klöpper
- ABB Corporate Research Center Wallstadter Straße 59 68526 Ladenburg Germany
| | - Leon Urbas
- Technische Universität Dresden Professur für Prozessleittechnik 01062 Dresden Germany
| |
Collapse
|
26
|
Kulkarni V, Gawali M, Kharat A. Key Technology Considerations in Developing and Deploying Machine Learning Models in Clinical Radiology Practice. JMIR Med Inform 2021; 9:e28776. [PMID: 34499049 PMCID: PMC8461525 DOI: 10.2196/28776] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2021] [Revised: 06/29/2021] [Accepted: 07/10/2021] [Indexed: 12/29/2022] Open
Abstract
The use of machine learning to develop intelligent software tools for the interpretation of radiology images has gained widespread attention in recent years. The development, deployment, and eventual adoption of these models in clinical practice, however, remains fraught with challenges. In this paper, we propose a list of key considerations that machine learning researchers must recognize and address to make their models accurate, robust, and usable in practice. We discuss insufficient training data, decentralized data sets, high cost of annotations, ambiguous ground truth, imbalance in class representation, asymmetric misclassification costs, relevant performance metrics, generalization of models to unseen data sets, model decay, adversarial attacks, explainability, fairness and bias, and clinical validation. We describe each consideration and identify the techniques used to address it. Although these techniques have been discussed in prior research, by freshly examining them in the context of medical imaging and compiling them in the form of a laundry list, we hope to make them more accessible to researchers, software developers, radiologists, and other stakeholders.
Collapse
Affiliation(s)
| | | | - Amit Kharat
- DeepTek Inc, Pune, India
- D Y Patil University, Pune, India
| |
Collapse
|
27
|
|
28
|
Cost-Sensitive Classification for Evolving Data Streams with Concept Drift and Class Imbalance. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2021; 2021:8813806. [PMID: 34381499 PMCID: PMC8352686 DOI: 10.1155/2021/8813806] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/07/2020] [Revised: 07/04/2021] [Accepted: 07/21/2021] [Indexed: 11/17/2022]
Abstract
Class imbalance and concept drift are two primary principles that exist concurrently in data stream classification. Although the two issues have drawn enough attention separately, the joint treatment largely remains unexplored. Moreover, the class imbalance issue is further complicated if data streams with concept drift. A novel Cost-Sensitive based Data Stream (CSDS) classification is introduced to overcome the two issues simultaneously. The CSDS considers cost information during the procedures of data preprocessing and classification. During the data preprocessing, a cost-sensitive learning strategy is introduced into the ReliefF algorithm for alleviating the class imbalance at the data level. In the classification process, a cost-sensitive weighting schema is devised to enhance the overall performance of the ensemble. Besides, a change detection mechanism is embedded in our algorithm, which guarantees that an ensemble can capture and react to drift promptly. Experimental results validate that our method can obtain better classification results under different imbalanced concept drifting data stream scenarios.
Collapse
|
29
|
Wang X, Kang Q, Zhou M, Pan L, Abusorrah A. Multiscale Drift Detection Test to Enable Fast Learning in Nonstationary Environments. IEEE TRANSACTIONS ON CYBERNETICS 2021; 51:3483-3495. [PMID: 32544055 DOI: 10.1109/tcyb.2020.2989213] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
A model can be easily influenced by unseen factors in nonstationary environments and fail to fit dynamic data distribution. In a classification scenario, this is known as a concept drift. For instance, the shopping preference of customers may change after they move from one city to another. Therefore, a shopping website or application should alter recommendations based on its poorer predictions of such user patterns. In this article, we propose a novel approach called the multiscale drift detection test (MDDT) that efficiently localizes abrupt drift points when feature values fluctuate, meaning that the current model needs immediate adaption. MDDT is based on a resampling scheme and a paired student t -test. It applies a detection procedure on two different scales. Initially, the detection is performed on a broad scale to check if recently gathered drift indicators remain stationary. If a drift is claimed, a narrow scale detection is performed to trace the refined change time. This multiscale structure reduces the massive time of constantly checking and filters noises in drift indicators. Experiments are performed to compare the proposed method with several algorithms via synthetic and real-world datasets. The results indicate that it outperforms others when abrupt shift datasets are handled, and achieves the highest recall score in localizing drift points.
Collapse
|
30
|
Liu A, Lu J, Zhang G. Concept Drift Detection via Equal Intensity k-Means Space Partitioning. IEEE TRANSACTIONS ON CYBERNETICS 2021; 51:3198-3211. [PMID: 32324590 DOI: 10.1109/tcyb.2020.2983962] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
The data stream poses additional challenges to statistical classification tasks because distributions of the training and target samples may differ as time passes. Such a distribution change in streaming data is called concept drift. Numerous histogram-based distribution change detection methods have been proposed to detect drift. Most histograms are developed on the grid-based or tree-based space partitioning algorithms which makes the space partitions arbitrary, unexplainable, and may cause drift blind spots. There is a need to improve the drift detection accuracy for the histogram-based methods with the unsupervised setting. To address this problem, we propose a cluster-based histogram, called equal intensity k -means space partitioning (EI-kMeans). In addition, a heuristic method to improve the sensitivity of drift detection is introduced. The fundamental idea of improving the sensitivity is to minimize the risk of creating partitions in distribution offset regions. Pearson's chi-square test is used as the statistical hypothesis test so that the test statistics remain independent of the sample distribution. The number of bins and their shapes, which strongly influence the ability to detect drift, are determined dynamically from the sample based on an asymptotic constraint in the chi-square test. Accordingly, three algorithms are developed to implement concept drift detection, including a greedy centroids initialization algorithm, a cluster amplify-shrink algorithm, and a drift detection algorithm. For drift adaptation, we recommend retraining the learner if a drift is detected. The results of experiments on the synthetic and real-world datasets demonstrate the advantages of EI-kMeans and show its efficacy in detecting concept drift.
Collapse
|
31
|
Farrugia D, Zerafa C, Cini T, Kuasney B, Livori K. A Real-Time Prescriptive Solution for Explainable Cyber-Fraud Detection Within the iGaming Industry. SN COMPUTER SCIENCE 2021; 2:215. [PMID: 33880451 PMCID: PMC8049394 DOI: 10.1007/s42979-021-00623-7] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/07/2020] [Accepted: 03/27/2021] [Indexed: 11/24/2022]
Abstract
This paper presents a real-time fully autonomous prescriptive solution for explainable cyber-fraud detection within the iGaming industry. We demonstrate how our solution facilitates the time-consuming task of player risk and fraud assessment through prescriptive analytics. Our tool leverages machine learning algorithms and advancements in the field of eXplainable AI to derive smarter predictions empowered by local interpretable explanations in real-time. Our best-performing pipeline was able to predict fraudulent behaviour with an average precision of 84.2% and an area under the receiver operating characteristics of 0.82 on our dataset. We also addressed the phenomenon of concept-drift and discussed our empirical and data-driven strategy for detecting and dealing with this problem. Finally, we cover how local interpretable explanations can help adopt a pro-active stance in fighting fraud.
Collapse
Affiliation(s)
| | | | - Tony Cini
- Gaming Innovation Group, St. Julians, Malta
| | | | | |
Collapse
|
32
|
Sarnovsky M, Kolarik M. Classification of the drifting data streams using heterogeneous diversified dynamic class-weighted ensemble. PeerJ Comput Sci 2021; 7:e459. [PMID: 33834113 PMCID: PMC8022634 DOI: 10.7717/peerj-cs.459] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2020] [Accepted: 03/05/2021] [Indexed: 06/12/2023]
Abstract
Data streams can be defined as the continuous stream of data coming from different sources and in different forms. Streams are often very dynamic, and its underlying structure usually changes over time, which may result to a phenomenon called concept drift. When solving predictive problems using the streaming data, traditional machine learning models trained on historical data may become invalid when such changes occur. Adaptive models equipped with mechanisms to reflect the changes in the data proved to be suitable to handle drifting streams. Adaptive ensemble models represent a popular group of these methods used in classification of drifting data streams. In this paper, we present the heterogeneous adaptive ensemble model for the data streams classification, which utilizes the dynamic class weighting scheme and a mechanism to maintain the diversity of the ensemble members. Our main objective was to design a model consisting of a heterogeneous group of base learners (Naive Bayes, k-NN, Decision trees), with adaptive mechanism which besides the performance of the members also takes into an account the diversity of the ensemble. The model was experimentally evaluated on both real-world and synthetic datasets. We compared the presented model with other existing adaptive ensemble methods, both from the perspective of predictive performance and computational resource requirements.
Collapse
|
33
|
Gu J, Lu S. An effective intrusion detection approach using SVM with naïve Bayes feature embedding. Comput Secur 2021. [DOI: 10.1016/j.cose.2020.102158] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
|
34
|
The impact of data difficulty factors on classification of imbalanced and concept drifting data streams. Knowl Inf Syst 2021. [DOI: 10.1007/s10115-021-01560-w] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
Abstract
AbstractClass imbalance introduces additional challenges when learning classifiers from concept drifting data streams. Most existing work focuses on designing new algorithms for dealing with the global imbalance ratio and does not consider other data complexities. Independent research on static imbalanced data has highlighted the influential role of local data difficulty factors such as minority class decomposition and presence of unsafe types of examples. Despite often being present in real-world data, the interactions between concept drifts and local data difficulty factors have not been investigated in concept drifting data streams yet. We thoroughly study the impact of such interactions on drifting imbalanced streams. For this purpose, we put forward a new categorization of concept drifts for class imbalanced problems. Through comprehensive experiments with synthetic and real data streams, we study the influence of concept drifts, global class imbalance, local data difficulty factors, and their combinations, on predictions of representative online classifiers. Experimental results reveal the high influence of new considered factors and their local drifts, as well as differences in existing classifiers’ reactions to such factors. Combinations of multiple factors are the most challenging for classifiers. Although existing classifiers are partially capable of coping with global class imbalance, new approaches are needed to address challenges posed by imbalanced data streams.
Collapse
|
35
|
A comprehensive active learning method for multiclass imbalanced data streams with concept drift. Knowl Based Syst 2021. [DOI: 10.1016/j.knosys.2021.106778] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
36
|
Liu A, Lu J, Zhang G. Diverse Instance-Weighting Ensemble Based on Region Drift Disagreement for Concept Drift Adaptation. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2021; 32:293-307. [PMID: 32217484 DOI: 10.1109/tnnls.2020.2978523] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Concept drift refers to changes in the distribution of underlying data and is an inherent property of evolving data streams. Ensemble learning, with dynamic classifiers, has proved to be an efficient method of handling concept drift. However, the best way to create and maintain ensemble diversity with evolving streams is still a challenging problem. In contrast to estimating diversity via inputs, outputs, or classifier parameters, we propose a diversity measurement based on whether the ensemble members agree on the probability of a regional distribution change. In our method, estimations over regional distribution changes are used as instance weights. Constructing different region sets through different schemes will lead to different drift estimation results, thereby creating diversity. The classifiers that disagree the most are selected to maximize diversity. Accordingly, an instance-based ensemble learning algorithm, called the diverse instance-weighting ensemble (DiwE), is developed to address concept drift for data stream classification problems. Evaluations of various synthetic and real-world data stream benchmarks show the effectiveness and advantages of the proposed algorithm.
Collapse
|
37
|
Akter MS, Islam MR, Tanaka T, Iimura Y, Mitsuhashi T, Sugano H, Wang D, Molla MKI. Statistical Features in High-Frequency Bands of Interictal iEEG Work Efficiently in Identifying the Seizure Onset Zone in Patients with Focal Epilepsy. ENTROPY 2020; 22:e22121415. [PMID: 33334058 PMCID: PMC7765521 DOI: 10.3390/e22121415] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/20/2020] [Revised: 12/11/2020] [Accepted: 12/11/2020] [Indexed: 01/22/2023]
Abstract
The design of a computer-aided system for identifying the seizure onset zone (SOZ) from interictal and ictal electroencephalograms (EEGs) is desired by epileptologists. This study aims to introduce the statistical features of high-frequency components (HFCs) in interictal intracranial electroencephalograms (iEEGs) to identify the possible seizure onset zone (SOZ) channels. It is known that the activity of HFCs in interictal iEEGs, including ripple and fast ripple bands, is associated with epileptic seizures. This paper proposes to decompose multi-channel interictal iEEG signals into a number of subbands. For every 20 s segment, twelve features are computed from each subband. A mutual information (MI)-based method with grid search was applied to select the most prominent bands and features. A gradient-boosting decision tree-based algorithm called LightGBM was used to score each segment of the channels and these were averaged together to achieve a final score for each channel. The possible SOZ channels were localized based on the higher value channels. The experimental results with eleven epilepsy patients were tested to observe the efficiency of the proposed design compared to the state-of-the-art methods.
Collapse
Affiliation(s)
- Most. Sheuli Akter
- Department of Electronic and Information Engineering, Tokyo University of Agriculture and Technology, Tokyo 184-8588, Japan;
| | - Md. Rabiul Islam
- Institute of Global Innovation Research, Tokyo University of Agriculture and Technology, Tokyo 184-8588, Japan;
| | - Toshihisa Tanaka
- Department of Electronic and Information Engineering, Tokyo University of Agriculture and Technology, Tokyo 184-8588, Japan;
- Institute of Global Innovation Research, Tokyo University of Agriculture and Technology, Tokyo 184-8588, Japan;
- Department of Electrical and Electronic Engineering, Tokyo University of Agriculture and Technology, Tokyo 184-8588, Japan;
- Department of Neurosurgery, Epilepsy Center, Juntendo University, Tokyo 113-8421, Japan; (Y.I.); (T.M.); (H.S.)
- RIKEN Center for Advanced Intelligence Project, Tokyo 103-0027, Japan
- RIKEN Center for Brain Science, Saitama 351-0106, Japan
- School of Computer Science and Technology, Hangzhou Dianzi University, Hangzhou 310018, China
- Correspondence: ; Tel.: +81-42-388-7123
| | - Yasushi Iimura
- Department of Neurosurgery, Epilepsy Center, Juntendo University, Tokyo 113-8421, Japan; (Y.I.); (T.M.); (H.S.)
| | - Takumi Mitsuhashi
- Department of Neurosurgery, Epilepsy Center, Juntendo University, Tokyo 113-8421, Japan; (Y.I.); (T.M.); (H.S.)
| | - Hidenori Sugano
- Department of Neurosurgery, Epilepsy Center, Juntendo University, Tokyo 113-8421, Japan; (Y.I.); (T.M.); (H.S.)
| | - Duo Wang
- Department of Electrical and Electronic Engineering, Tokyo University of Agriculture and Technology, Tokyo 184-8588, Japan;
| | | |
Collapse
|
38
|
Dallora AL, Minku L, Mendes E, Rennemark M, Anderberg P, Sanmartin Berglund J. Multifactorial 10-Year Prior Diagnosis Prediction Model of Dementia. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2020; 17:E6674. [PMID: 32937765 PMCID: PMC7557767 DOI: 10.3390/ijerph17186674] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/13/2020] [Revised: 09/09/2020] [Accepted: 09/10/2020] [Indexed: 12/23/2022]
Abstract
Dementia is a neurodegenerative disorder that affects the older adult population. To date, no cure or treatment to change its course is available. Since changes in the brains of affected individuals could be evidenced as early as 10 years before the onset of symptoms, prognosis research should consider this time frame. This study investigates a broad decision tree multifactorial approach for the prediction of dementia, considering 75 variables regarding demographic, social, lifestyle, medical history, biochemical tests, physical examination, psychological assessment and health instruments. Previous work on dementia prognoses with machine learning did not consider a broad range of factors in a large time frame. The proposed approach investigated predictive factors for dementia and possible prognostic subgroups. This study used data from the ongoing multipurpose Swedish National Study on Aging and Care, consisting of 726 subjects (91 presented dementia diagnosis in 10 years). The proposed approach achieved an AUC of 0.745 and Recall of 0.722 for the 10-year prognosis of dementia. Most of the variables selected by the tree are related to modifiable risk factors; physical strength was important across all ages. Also, there was a lack of variables related to health instruments routinely used for the dementia diagnosis.
Collapse
Affiliation(s)
- Ana Luiza Dallora
- Department of Health, Blekinge Institute of Technology, 371 79 Karlskrona, Sweden; (P.A.); (J.S.B.)
| | - Leandro Minku
- School of Computer Science, University of Birmingham, Birmingham B15 2TT, UK;
| | - Emilia Mendes
- Department of Computer Science, Blekinge Institute of Technology, 371 79 Karlskrona, Sweden;
| | - Mikael Rennemark
- Faculty of Health and Life Sciences, Linnaeus University, 351 95 Kalmar, Sweden;
| | - Peter Anderberg
- Department of Health, Blekinge Institute of Technology, 371 79 Karlskrona, Sweden; (P.A.); (J.S.B.)
| | - Johan Sanmartin Berglund
- Department of Health, Blekinge Institute of Technology, 371 79 Karlskrona, Sweden; (P.A.); (J.S.B.)
| |
Collapse
|
39
|
|
40
|
Lu Y, Cheung YM, Yan Tang Y. Adaptive Chunk-Based Dynamic Weighted Majority for Imbalanced Data Streams With Concept Drift. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2020; 31:2764-2778. [PMID: 31825880 DOI: 10.1109/tnnls.2019.2951814] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
One of the most challenging problems in the field of online learning is concept drift, which deeply influences the classification stability of streaming data. If the data stream is imbalanced, it is even more difficult to detect concept drifts and make an online learner adapt to them. Ensemble algorithms have been found effective for the classification of streaming data with concept drift, whereby an individual classifier is built for each incoming data chunk and its associated weight is adjusted to manage the drift. However, it is difficult to adjust the weights to achieve a balance between the stability and adaptability of the ensemble classifiers. In addition, when the data stream is imbalanced, the use of a size-fixed chunk to build a single classifier can create further problems; the data chunk may contain too few or even no minority class samples (i.e., only majority class samples). A classifier built on such a chunk is unstable in the ensemble. In this article, we propose a chunk-based incremental learning method called adaptive chunk-based dynamic weighted majority (ACDWM) to deal with imbalanced streaming data containing concept drift. ACDWM utilizes an ensemble framework by dynamically weighting the individual classifiers according to their classification performance on the current data chunk. The chunk size is adaptively selected by statistical hypothesis tests to access whether the classifier built on the current data chunk is sufficiently stable. ACDWM has four advantages compared with the existing methods as follows: 1) it can maintain stability when processing nondrifted streams and rapidly adapt to the new concept; 2) it is entirely incremental, i.e., no previous data need to be stored; 3) it stores a limited number of classifiers to ensure high efficiency; and 4) it adaptively selects the chunk size in the concept drift environment. Experiments on both synthetic and real data sets containing concept drift show that ACDWM outperforms both state-of-the-art chunk-based and online methods.
Collapse
|
41
|
Souza VMA, dos Reis DM, Maletzke AG, Batista GEAPA. Challenges in benchmarking stream learning algorithms with real-world data. Data Min Knowl Discov 2020. [DOI: 10.1007/s10618-020-00698-5] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
42
|
Silva RA, Britto Jr ADS, Enembreck F, Sabourin R, Oliveira LES. CSBF: A static ensemble fusion method based on the centrality score of complex networks. Comput Intell 2020. [DOI: 10.1111/coin.12249] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Affiliation(s)
- Ronan Assumpção Silva
- Postgraduate Program in Informatics (PPGIA)Pontifical Catholic University of Parana (PUCPR) Parana Brazil
- Department of InformaticsFederal Institute of Parana (IFPR) Parana Brazil
| | - Alceu de Souza Britto Jr
- Postgraduate Program in Informatics (PPGIA)Pontifical Catholic University of Parana (PUCPR) Parana Brazil
- Department of InformaticsState University of Ponta Grossa (UEPG) Parana Brazil
| | - Fabricio Enembreck
- Postgraduate Program in Informatics (PPGIA)Pontifical Catholic University of Parana (PUCPR) Parana Brazil
| | - Robert Sabourin
- Laboratoire d'Imagerie, de Vision et d'Intelligence ArtificielleÉcole de Technologie Supérieure (ÉTS) Montreal Canada
| | | |
Collapse
|
43
|
Akter MS, Islam MR, Iimura Y, Sugano H, Fukumori K, Wang D, Tanaka T, Cichocki A. Multiband entropy-based feature-extraction method for automatic identification of epileptic focus based on high-frequency components in interictal iEEG. Sci Rep 2020; 10:7044. [PMID: 32341371 PMCID: PMC7184764 DOI: 10.1038/s41598-020-62967-z] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2019] [Accepted: 03/23/2020] [Indexed: 11/23/2022] Open
Abstract
Presurgical investigations for categorizing focal patterns are crucial, leading to localization and surgical removal of the epileptic focus. This paper presents a machine learning approach using information theoretic features extracted from high-frequency subbands to detect the epileptic focus from interictal intracranial electroencephalogram (iEEG). It is known that high-frequency subbands (>80 Hz) include important biomarkers such as high-frequency oscillations (HFOs) for identifying epileptic focus commonly referred to as the seizure onset zone (SOZ). In this analysis, the multi-channel interictal iEEG signals were splitted into segments and each segment was decomposed into multiple high-frequency subbands. The different types of entropy were calculated for each of the subbands and the sparse linear discriminant analysis (sLDA) was applied to select the prominent entropy features. Due to the imbalance of SOZ and non-SOZ channels in iEEG data, the use of machine learning techniques is always tricky. To deal with the imbalanced learning problem, an adaptive synthetic oversampling approach (ADASYN) with radial basis function kernel-based SVM was used to detect the focal segments. Finally, the epileptic focus was identified based on detection of focal segments on SOZ and non-SOZ channels. Eight patients were examined to observe the efficiency of the automatic detector. The experimental results and statistical tests indicate that the proposed automatic detector can identify the epileptic focus accurately and efficiently.
Collapse
Affiliation(s)
| | - Md Rabiul Islam
- Tokyo University of Agriculture and Technology, Tokyo, Japan
| | - Yasushi Iimura
- Department of Neurosurgery, Epilepsy Center, Juntendo University, Tokyo, Japan
| | - Hidenori Sugano
- Department of Neurosurgery, Epilepsy Center, Juntendo University, Tokyo, Japan
| | - Kosuke Fukumori
- Tokyo University of Agriculture and Technology, Tokyo, Japan
| | - Duo Wang
- Tokyo University of Agriculture and Technology, Tokyo, Japan
| | - Toshihisa Tanaka
- Tokyo University of Agriculture and Technology, Tokyo, Japan. .,Department of Neurosurgery, Epilepsy Center, Juntendo University, Tokyo, Japan. .,RIKEN Center for Brain Science, Saitama, Japan. .,RIKEN Center for Advanced Intelligence Project, Tokyo, Japan. .,School of Computer Science and Technology, Hangzhou Dianzi University, Hangzhou, China.
| | - Andrzej Cichocki
- Tokyo University of Agriculture and Technology, Tokyo, Japan.,School of Computer Science and Technology, Hangzhou Dianzi University, Hangzhou, China.,Center for Computational and Data-Intensive Science and Engineering, Skolkovo Institute of Science and Technology, Moscow, Russia
| |
Collapse
|
44
|
Toor AA, Usman M, Younas F, M. Fong AC, Khan SA, Fong S. Mining Massive E-Health Data Streams for IoMT Enabled Healthcare Systems. SENSORS 2020; 20:s20072131. [PMID: 32283841 PMCID: PMC7180875 DOI: 10.3390/s20072131] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/13/2020] [Revised: 04/04/2020] [Accepted: 04/07/2020] [Indexed: 12/02/2022]
Abstract
With the increasing popularity of the Internet-of-Medical-Things (IoMT) and smart devices, huge volumes of data streams have been generated. This study aims to address the concept drift, which is a major challenge in the processing of voluminous data streams. Concept drift refers to overtime change in data distribution. It may occur in the medical domain, for example the medical sensors measuring for general healthcare or rehabilitation, which may switch their roles for ICU emergency operations when required. Detecting concept drifts becomes trickier when the class distributions in data are skewed, which is often true for medical sensors e-health data. Reactive Drift Detection Method (RDDM) is an efficient method for detecting long concepts. However, RDDM has a high error rate, and it does not handle class imbalance. We propose an Enhanced Reactive Drift Detection Method (ERDDM), which systematically generates strategies to handle concept drift with class imbalance in data streams. We conducted experiments to compare ERDDM with three contemporary techniques in terms of prediction error, drift detection delay, latency, and ability to handle data imbalance. The experimentation was done in Massive Online Analysis (MOA) on 48 synthetic datasets customized to possess the capabilities of data streams. ERDDM can handle abrupt and gradual drifts and performs better than all benchmarks in almost all experiments.
Collapse
Affiliation(s)
- Affan Ahmed Toor
- Department of Computer Science, Shaheed Zulfikar Ali Bhutto Institute of Science and Technology, Islamabad 44000, Pakistan; (A.A.T.); (M.U.); (F.Y.)
| | - Muhammad Usman
- Department of Computer Science, Shaheed Zulfikar Ali Bhutto Institute of Science and Technology, Islamabad 44000, Pakistan; (A.A.T.); (M.U.); (F.Y.)
| | - Farah Younas
- Department of Computer Science, Shaheed Zulfikar Ali Bhutto Institute of Science and Technology, Islamabad 44000, Pakistan; (A.A.T.); (M.U.); (F.Y.)
| | - Alvis Cheuk M. Fong
- Department of Computing, Western Michigan University, Gladstone, MI 49837, USA
- Correspondence: ; Tel.: +1-269-2763-110
| | - Sajid Ali Khan
- Department of Software Engineering, Foundation University Islamabad, Islambad 44000, Pakistan;
| | - Simon Fong
- Department of Computer and Information Science, University of Macau, Macau 999078, China;
| |
Collapse
|
45
|
Classical and Deep Learning Paradigms for Detection and Validation of Key Genes of Risky Outcomes of HCV. ALGORITHMS 2020. [DOI: 10.3390/a13030073] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
Hepatitis C virus (HCV) is one of the most dangerous viruses worldwide. It is the foremost cause of the hepatic cirrhosis, and hepatocellular carcinoma, HCC. Detecting new key genes that play a role in the growth of HCC in HCV patients using machine learning techniques paves the way for producing accurate antivirals. In this work, there are two phases: detecting the up/downregulated genes using classical univariate and multivariate feature selection methods, and validating the retrieved list of genes using Insilico classifiers. However, the classification algorithms in the medical domain frequently suffer from a deficiency of training cases. Therefore, a deep neural network approach is proposed here to validate the significance of the retrieved genes in classifying the HCV-infected samples from the disinfected ones. The validation model is based on the artificial generation of new examples from the retrieved genes’ expressions using sparse autoencoders. Subsequently, the generated genes’ expressions data are used to train conventional classifiers. Our results in the first phase yielded a better retrieval of significant genes using Principal Component Analysis (PCA), a multivariate approach. The retrieved list of genes using PCA had a higher number of HCC biomarkers compared to the ones retrieved from the univariate methods. In the second phase, the classification accuracy can reveal the relevance of the extracted key genes in classifying the HCV-infected and disinfected samples.
Collapse
|
46
|
Shukla S, Raghuwanshi BS. Online sequential class-specific extreme learning machine for binary imbalanced learning. Neural Netw 2019; 119:235-248. [DOI: 10.1016/j.neunet.2019.08.018] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2019] [Revised: 07/03/2019] [Accepted: 08/15/2019] [Indexed: 12/25/2022]
|
47
|
|
48
|
Chiba Z, Abghour N, Moussaid K, El omri A, Rida M. Intelligent approach to build a Deep Neural Network based IDS for cloud environment using combination of machine learning algorithms. Comput Secur 2019. [DOI: 10.1016/j.cose.2019.06.013] [Citation(s) in RCA: 34] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
|
49
|
Evolving Spiking Neural Networks for online learning over drifting data streams. Neural Netw 2018; 108:1-19. [DOI: 10.1016/j.neunet.2018.07.014] [Citation(s) in RCA: 41] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2018] [Revised: 06/11/2018] [Accepted: 07/25/2018] [Indexed: 11/18/2022]
|